MASSACHUSETTS INSTITUTE OF TECHNOLOGY 
A RTS FACIAL INTELLIGENCE LABORATORY 

Al Memo *57? Aug, l it t976- 


REPRESENTATION AND RECOGNITION OF THE SPATIAL 
ORGANIZATION OF THREE DIMENSIONAL SHAPES 



D. Marr and H, K. Niahiham 


ABSTRACT. A method ii given for representing 5-D shapes, k Is based on a hierarchy of stick figures 
(ailed 3-D moduli), where each stiefc corresponds to an a,*<3 in the shape's generalued cone representation. 
Although the represents Item of a complete shape may contain many Stick figures- at different levels of 
detail, only one stick figure ts examined at a time while the representation is being used to Interpret an 
image. By tints balancing scope of description against detail, the complexity of the computations needed 
to support the representation is mlnimued The method requires (a) a database of stored stick figures; (b> 
a simple device railed the twigt-spaw prottistir for moving between object-tentered and viewer-centered 
coordinate frames; and (r) a process for ’relaxing" a stored model onto the image during recog nation 
The relation of the theory to "mental rotation" phenomena U di«ussed h and some critical experimental 
predictions are made. 


This report describes research done at the Artificial Intelligence Laboratory of the Maj-wtchusetrs Institute 
of-Technology. Support for the laboratory** artificial intelligence research is provided in part fry the 
Advanced Research Projects Agency of the Department of Defense under Office of Naval Research 
contract NOOOH-75-C-06+1. 


Si.Ttiittary 

I. A method is given for representing 3-D shapes. It is based on a hierarchy of stick figures 
^called 3-D mcxltfl), where each Stick correspohdi to an axis in the shape's generalised cdttei 
representation. 

2 By using Stick FlgUTes to represent a shape and its parts at several levels of detail, a 
representation is obtained that is intrinsically simple, ye* which maintains Its fidelity to an 
arbitrary level of precision. 

3. While the representation is being used, to interpret an image,, only one Slick figure Is 
examined at a time. By thus balancing scope of description against detail, the'complexity of 
the computations needed to support the representation is minimized. 

4 The structures and processes associated with the method are described. The most 
impoTiant are (a) a database of stored stick-figure*, which are indexed in several ways; <b) 
in tge-ipitie prate iSflr, which is a simple mechanism for moving between object-centered 
and viewer centered coordinate frames; and (c) a process for "relaxing” a stored model onto 
the image during the recognition and representation of spatial onentatinn. 

5. Some facets of Ihe theory’s relaxation process resemble the computation of a 3 D rotation, 
but a computer graphics metaphor is misleading In fact the manipulations take place on 
abstract vectors (the sticks)- that are not even present in rhe original image, and it is roughly 
correct to say that only two such vetLQri arc explicitly represented at a lime. 

G. If the method is taken as a psychologies! [henry, it makes a critical prediction which* if 
False, would disprove It, Views of an object in Which an important axis of its generalised 
cone representation IS severely foreshortened me peculiarly difficult to interpret. Such views 
are not, uncommon, and it is p red iefed that Hus class of view* COT responds tb thbie that 
Warrington & Taylor (19ft) labelled "unconventionar. Their patients should therefore I all 
on these views. 

7. The theory provides an explanation of most of the experimental result* concerning mental 
rotation that have recently been discovered hy R N Shepard and his colleagues The linear 
dependence between time TO Interpret and 3-D angular discrepancy is however not a deep 
consequence Of the theory* merely the signature oF Implementing it in a particularly simple 
way. 


I: Introduction 

At some point during die analysis of a two dimensional image of an 
nbJPCt. the Three-dimensional s-trurtkiee- of th* viewed object and its spatial relation to the 
viewer must, be established and represented, The question Ls how? The form of the answer 
wc require is not a detailed specification of some complex neurophysiological mechanism, 
although eventually one will wish to derive such a thing. First, we need a men* abstract 
understanding of the computational problems involved, that shows when arid how to use the 
various kinds of information that are available from an linage. The understanding that we 
seek may be expressed as a method (see Marr 1976a); it amounts to a competence theory for 
this aspect of 3-D vision. 

Thts an tele presents such a method, and It has four key Ingredients 

(a) The deep structure of the rh rE«’dinwnsi on a I representation of an object’s shape consists 
of a coarse stick figure, whose sticks correspond tp a*es of the major components of the 
shape (such as arms, torsos head); and of Individually addressable stick figures for each of 
the component shapes, In this way, arbitrary detail can be represented in a system each of 
whose component suck, figures is rather simple, yet which maintains faithfully the important 
shape characteristics at each level of description 

(b) Each stick, figure is defined by a propositional database called a 3-D mtulet The 
geometrical structure of a 3-D model 1 ii specified by storing the relative orientations of pairs 
of connecting Sticks, Thus the specification js made in a local coordinate system based on 
till* principal component oF the shape at that level of description, not in absolute coordinates 
based on a circumscribing frame of reference. 

(c) When a 3-D model is being used to interpret an image, a computation must be made that 
relates the geometrical relationships among the sticks of the 3*D model to the 2-D 
relationships among the projections of those sticks ur the image. The computation depends 
Upon the oriental ion and location of the 3-D model relative to the viewer. Tins is 
accomplish Ed by a computationally simple mechanism called! the imag&sfuttt proetttor, which 
may he thought of at a device for transforming a vector between objcCt-ccnlercd and 
viewer-centered coordinate -systems. 

(d) During recognition, a sophisticated interaction Lakes place between the image, the 3-D 
model, and the image-space processor. This interaction gradually relaxes the stored 3-D 
model SO that its axes project onto Ihe axes computed from the image, borne facets of this 
process resemble the computation of a 3-D rotation, but a simple computer graphics 
metaphor is misleading. Ih fact, the rotations tale place nn abstract vectO?$ {the axes) that 
are hot even present in the original Image, and tt li roughly, correct to say that only two 
such vectors are explicitly ^presented at a time. 

Thus the essence txf the theory is a method for representing the spatial 
disposition of the parts of an cibjccL and their relation to lhe viewer- We believe that u may 
Jhect some light on the phenomena or mentsl notation uncovered by R id ShepAnd and his 
Collaborators, and on certain neurological findings Sported by Warrington & Taylor (1973). 
fldrigT-SEturf. rffcomiwrifjcn cf 1 he recognition proctff 

Our overall picture of the necagmtmn problem is illustrated in figure 1, 
which embodies two points that we take as assumptions. Firsrly, to a first approximatiun the 


3-D model representation 3“0 shape 


(3 - D mode t catalog) 


(image-space proeessoc) 


Axis-based description of shapes in image 


(Karr 1976c, VaCen £ harr 1976} 


Region^ „ corresponding En simple images to objects 


Primal 3 ke t c h [Harr i 9/6 b) 

i HlrJ^ e 


pig-Lire 1. This. diag: m summarizes our overall view of the visual recognition problem, and 
it embodies several points that this article lakes as assumptions. The fitst IS that the 
recognition process decomposes to a set of modules that are to a first approximation 
independent. The simplified subdivision shown here consists of four main stages, each of 
which may contain several mod Liles. {I) The translation of the Image into a primitive 
description called the primal sketch {Marr l97Sb)s ft) The division of I he primal sketch into 
regions or forms, through the action of Various grouping processes tanging in scope from 
the very local to global predicates like a rough type Of connectedness; (3} The assignment of 
an asil-based description to each form (see Figure -t); and (1) The construction of a 3-T) 
model for the viewed shape, based Initially on the axes delivered by (3*. The relation 
between the 3-D model representation of a shape and the image of that shape is found and 
maintained with the help of Lhe image-space processor. Finally, the representation of the 
geometry uf a shape is separate from the representation of the shape’s use or purpose 
(Warrington Sc Taylor 




process, of viau.iI recognition decomposes to a set oF modular step!, The evidence for this is 
extensive but indirect. It includes evidence From efectr&phystologlcal recordings from 
Adrian (1941} So Hubei Sc Wiesel (196?. 1%5), Bilflow, Blakcmorc If Pettigrew (ED&T), and Zekl 
flEJ 1 ??); histological and neunoanaiomiClf evidence from Brodmann (1909) and Gajal {1911) to 
modern studies such as those ol Zeki (3971), Allman r Kaas, Lane t Mieiln {1973), AHnun, 
Kaas £ Lane {1973), Allman 8; Kaas (I'SQ'ta. b & c) and the mass of djmear studies describing 
patients who have lost particular and highly circumscribed functional parts of their 
perceptual or motor faculliej (Critchley 3953, Luna 1970, Vinken & Bruyn 1969). Evidence 
against the assumption oF modularity in its strictest form comes from Illusions in which 
quite late processing or high-level knowledge about an image appears to influence earlier 
processing: For example, shape recognition normally Follows figure^grotind separatum, bur 
can sometimes influence it (e.g Street L9311. According to (he assumption of modularity, 
ihese effects should he regarded as second-order interactions between modules ihat are to a 
first approximation independent (M a rr 1976b). 

Secondly, we assume that there exists a module (or group of modules) 
that is concerned with describing the 3-D shape of an Item, and that this module IS separate 
from the representation of a.0 item's functional semantics The evidence Tor this is a 
penetrating analysis by Warrington Sc Taylor {19)3), who concluded that these two Functions 
reside in distinct cortical areas. Palients with left parietal lesions showed disorders feinted to 
the use and purpose of an object, but their ability to recognize and represent its 3-B shape 
appeared to be intact The Opposite w*s tme of patients with right parietal lesions. 

This article describes a theory oF the representation and recognition oF 3- 
D shape Some parti of it, including Ihe 3-D representation scheme and the image-space 
processor, are precisely defined Other parts, for example those concerned with database 
access during recognition, are not yet rigorous. The reader will recognize that the looser 
parts of the theory are those that are closely intertwined with other modules that we have 
not yet studied, and cannnt be made precise until the enact nature of those modules, and 
what they can deliver from an image, has been defined. We rreugniie the shortcomings in 
this account that arise for this reason, but believe that in order to rectify them one has to 
have a clear grasp of a larger portion oT the overall recognition process than the particular 
3-D module described here. The theory as described here, together with other work fMarr 
1976b. Mafr 197&C, Varan & Mari 1976, Mart Sc poggio I976a„ Ullman (97$) summarised 
brieFly by Marr & Poggio (1976b), repreienls an attempt at decomposing the vision problem 
into modules. Study of Ihe inSeiaclLnns between modules must follow this. 

Gtnttci tidfsirc of thr 3-t) Ttprtimtation 

Methods For deriving and manipulating the representation of a 3U 
shape depend heavily on the nature of che representation used Our first task is therefore 
to discover which representation is most appropriate- There are four ideas cutrenr in the 
literature: the "multiple view" representation described by Minsky (1975k Eaumgan's (1975) 
representation by polyhedral appio*imatlon., the "generalized cylinder" repiesenfailon 
proposed by Blnfot'd (L&71), and Blum'S (19731 "symmetric axis" representation, which is 
similar to the generalised cylinder representrion for S-D shapes, bur differs from it in three 



Figure- 3.. This figure u taken from figure i of Baumgart (.1975). and illustrates his 
repreienritkn of 3-D shape by polyhedral approximation. From three views of a plastic 
horse, rhe silhouettes (a), fb) and {c) were obtained. A 3-D structure was computed from 
these Silhouettes by a rone intersection technique, and the polyhedral representation of (he 
resulting shape is illustrated In {d). Various disadvantages of this representation, of- which 
the UKJSr severe is its lack Of uniqueness, combine tn render it an unlikely candidate for the 
psychological representation of 3-D shape. 














dimensions. 


The multiple view representor ion it based on (he insight that if one 
Chooses one's primitives cot reel I y {eg. the "ilde" of a cube), rhr number of qualitatively 
differern views of an object may be quite smalt. Minsky {1975) proposed that the 
represents Sian of * 3-D Shape might therefore consist of a catalogue of the different 
appearances of that Shape, and that this catalogue would not reed to be too large The 
multiple view representation Is. 31L present underdefined - for example. -are a IE "views" of * 
man the Same in which Lhe same limbs aic visible but arranged in different positions? - 
and so it n difficult la argue cogently against it Nevertheless something of a case Against it 
can be made from Warrington Sr TaylorS finding! The side view of a, Water pail is 
very different from the top view, and both are reasonably simple (see figure ?). Since both 
vlpws Ate probably equally common, one would expect the multiple view representation to 
contain and (presumably) to Fiave indexed both of them. If the lesions of Warrington Sc 
Taylor's patients had randomly damaged a multiple view representation, one would expect 
some patients to have tost one view, and Others, another But the finding .* (hat all patients 
»re impaired oruhe same view {the one from above), views that Warrington Sc Taylor called 
"unconventponAI". Although the multiple view ripreseitla Lion ij not absolutely incompatible 
With these finding!, strong extra assumptions are needed So incorporate them 

Baumgart (l$7b) has proposed using a system of polyhedral 
approximations to 3-D shapes (see figure 3). The motivation for this is that Computer 
graphics systems make it easy to manipuhie representations constructed of straight edge 
Segments, and the comparison between :he expected view and the actual view of A 
polyhedral Structure ls therefore feasible. He makes no claims chat this representation has 
any psychological importance, however,. And the features that make it attractive for machine 
Vision tend to make it an unattiActive candidate for psychology Although Baumgart has 
Addressed with some JUCWSS the problem Of constructing a 3-D model from several Views ol 
an bbject, he has not shown haw to recognize a known model from just one monocular view. 
More seriously, there is no real sense of uniqueness in his rep resen ration. A horse shape CAfi 
be approximated in many ways by polyhedia, and there is no guarantee that the 
representations obtained on two different occasions from different .sets of views will be 
homologous. A representation that lacks a strong uniqueness condition will be almost useless 
far recognition. There Ate also Other difficulties with polyhedral approximation. They 
include the lack of Any natural representation of articulation of parts of an object (eg. arms 
And legs); the difficulty of answering overall questions about an object, like where it is 
pointing, given only a set of polyhedra ejrh of which describes some small part; and the 
complex way in which joins between polyhcdra have to be specifierf As a candidate for 
psychology, this rep resent AliOtt at present seems to have no particular advantages and 
several c! I sad vantages. We shall therefore not consider it further 

A genmlizai cylinder is the surface swept out by moving a cro«-SccLion 
along an axis. The axis need nol be straight, and the crosi-seclion may vaty The 
generalised cylinder re presen I a Li on of an object is obtained by splitting it up Into 
components each at which is described in this way. A gtmriiUztd cone is a general!ted 
cylinder in which the shape of the cross-section reman™ constant but for smooth" variations 



Figure % (a) and (b) show [ivt> view* of a water-pail. Warrington * Taylor's patterns 
are impaired on (bj r but Tied on (a). This is difficult to reconcile with Minsky's (3^5) 
multiple view representation, Sihce both views are about as common. It ds consistent with the 
3-D model representtion, for reasons that are clear from (c) and (d). The outlines of the 
original figures are shown as thm lines, and the axis is shown as a thick one- This axis is 
directly recoverable From image (&1, hut not from (b} where it is severely foreshortened. 
Since the S-D model representation relies Ofl in explicit representation of this axis, the 
successful recognition of views like (b) requires considerable extra computation. 















in size 


Agin [s973) and Nevada (197f) Used a laser rAng-eTindmg technique to 
□blain the generalized qyljflder representalion of object* such as a Barbie doll, a snak* and 
a hen# HolFcrbach {1975} showed how contour Information may he used to derive (he 
generalised cylinder representation of * wide range of pottery, and he found that the 
descriptive terminology for such artifacts in the archaeological literature corresponds 
naturally to terms that appear in the gen era Hied cyEinder representation. Marr flEft&c) has 
proved that certain assumptions, which StC implicit In the derivation of Shape from contour, 
are equivalent to assuming that the viewed shapes are composed of genera I i red cones; and 
Vitan Sc btarr (l*^) have constructed algorithms for segmenting the monocular image oF a 
shape into Its generaIiied cone components (see figure 4}. 

Blum (1073) has developed a geometry of shape based on the notion of 
growth outward from a point In two dimensions, his representation may be obtained by 
imagining a fire lie at all petals around an outline. The fire from opposite ’’jkles"' of a 
figure-Will meet in the middle, along what Blum calls the figure’s ’symmetric axis" The 
representation consists of inverting this process, specifying the symmetric axis and the 
degree of growth outward from each point on it. 

For two-dimensional shapes, this representation resembles the generalized 
cylinder represents ion. although it is not identical. For three dimensions however, the 
"symmetric axis" may be two-dimensionat, so this representation differs From generalized 
cylinder* ifi 4 substantial way. Of the two representation*, generalized cones seem to be 
preferable because for three-dimensional surfaces they Are simpler, and because of their 
intimate connection with assumptions that are implicit in the interpretation of occluding 
contours in an image (Marr 397&c). 

The generalized COne representation introduce* two main problems; 
Obtaining the axes and the crosi-Jettions of the different part* of An object {arm*, legs, 
torso), and representing the Spatial disposition of the oomponeni* thus obtained. These (Asks 
are nearly independent, and this articte li concerned only with the second of them, how tts 
represent the arrangement in space of the different cones into which the viewed shape is 
decomposed. To solve this problem, It is enough to represent the spatial dispositions of the 
exes that occur ifi an object? generalized cone representation, which is equivalent to the 
probiPTU of describing stick figures - model* made out of pipe-cleaners, Dne for each axis 
free figure 5). Such models exhibit *uJy the length* and disposition of axes in the 
gen era Sized cylinder representation, yet we can easily discern the giraffe, ostrich and goal In 
the figure. That their recognition i* so easy mates it reasonable to suppose that we 
ourselves decompose rhe 3-D representation problem into similar components. 

Ill The Structures of the theory 

Th* theory consists of n method for determining and representing ihr 
three-dimensional dispositions of a Stick figure’s axe* for the purpose of recognition, given 
only a two-dimensional pmjccuon of those axes, [t rests on the interplay between the Image 
and two Other Structure*: a database of stored representations of Shapes {the 3-1} models), 
and a mechanism for pei forming coordinate transforms (the Image-space processor), The 


Figure 1 Analysis of a contour from Vatan and Marr (1976). The outline (a} was obtained 
by applying local grouping operation* to a primal sketch (Marr 1976b). It is then jmacrthed. 
and divided Into convex and coocatre components (b). The outline is searched for deeply 
concave points or components, which correspond co strong segmentation points. One such 
point is marked with an open Circle ah (t), There are usually several possible numbing 
poults for each Strong segmentation point, and the candidates for the marked point arc 
shown here by filled Circle* {c}. The correct. mates for each segmentation point can usually 
be found by eliminating restively poor candidates. The result of doing this here is the 
segmentation shown in (d). Once The.se segments hive been deFincd, their corresponding 
axes (thick lines) are easy to obtain (e). They do not usually connect, but may be related (O' 
ohe another by inter mediate lines which are called, embedding rtiailmt {thin lines in f) 
According Eg the present theory, the resulting Stick figure {F} is the deep structure or which 
interpretation of this image is based. 




































Figure 5- The theory asserts that the 3 D representation of a shape is decomposed into two 
parts, the description of ihe cross-sections that occur in the shape'* generalised cone 
representation, and the disposition of the a):es of these eon64 in space. Our theory deals 
with the second problem, which is essentially the problem oF describing stick figures- The 
Shapes in these pictures were made out of pipe-ricanrrs. The reader will have tin trouble in 
recognising the giraffe,, goat, rah bit and ostrich. That their recognition is so easy males it 
reasonable to suppose that at some stage, we ourselves decompose the 3-D representation 
problem into Similar components. 





Figure 6. Example of 3-D models* and their arrangement into the 3-D model representation 
of a human 5hapc. A D-D model consists of a model axis and component axes {left and. 
right figures respectively in bos labeled HUMAN) the latter consisting of a principal axis 
(the torso) and several auxiliary axes (the head and limbs) whose positions arc described; 
relative 10 the principal axis The complete human D-D model i* enclosed in the rccLangle 
labeled HUMAN. The 3-D model representation is obtained by concatenating 3-D models 
for different partial different levels of detail This ii achieved by allowing a component 
axis of one 3-D model to be the model axis of another, Here* foi example, the arm 
auxiliary a*u In the human 3-D model acts as the model axis for the arm 3-D model, which 
itself has two component axes ( th< upper and lower arms. The figure shows how this 
scheme extends downwards as fur as the fingers 


HUMAN 




DFSTAL FINGER 


FINGER 













































basic strategy of out approach rests on the pTinclplei of least commitment and graceful 
degradation {see below and Man 1976b) so Lhat the method depends greatly on the analysis 
of constraints that arise at different stages of ihe processing. In this section and the next, 
we give a (incursi vp account of the structures of our theory, and of the processes that 
employ them. The appendix describes a particular computer implementation of the theory, 
and gives an example of its application. 

Tht J-i? mode/ rc^rrjfntisJian of shape 

Our representation of 3-D shape is based on the idea of a stick figure, 
where each stick is the axis of a gen era I i £cd cone {as defined above). For the purpose of 
this paper we shall limit ourselves still further, to regular cylinders in place of generalized 
cones. The basic element tn the description of shape is called a J'O model and consists of: 

(i) A mtdW axit, which provides * very coarse specification of the general 
site and orientation of the shape. 

(ii) A small number (possibly iero) or ronpsTiFTi t axes. The component axes 
consist of a distinguished axis called the principal axis of the 3-D model, 
and a number of acmfzary oxer. The dispositions of the auxiliary axe; are 
defined relative to the principal axis, and that of the principal axis is 
defined relative to the model axis. 

(Ill) Associated wllh each axis is a shape description, which in the present 
tesiricted iheory consists of the specification of a cylinder. 

For example, the 3-D model feu thr overall shape of a human has sis component axes in 
addition to the single model axis foT the whole shape. The principal axis corresponds TO the 
torso, and she five remaining component axes correspond io the head and limbs chat arc 
connected to it (see Tigure 6). 

Although i single 3-D rondel is a simple structure, several may bo 
combined to create a description of arbitrary depth and complexity. This is achieved by the 
tanccfe7i atton rule for S-D models, according to which * component axis of one 3-D model 
servos as the model axis axis of another, By combining 3-D models in this way, one can 
build Up descriptions of a particular physical structure to whatever level of detail is 
required. Such a description is called the ?-D \nodel representation of a physical structure. 

Figure & illustrates how model concatenation Is used to create the 3-D 
model representation of a human shape, and it exhibits the hierarchy that concatenation 
Induces. At the lop level is the 3-D model for '.he overall human shape. As we saw above, 
this contains a single cylinder description or the overall shape (based on the model asks), 
and axes for each of the shape’s six major components. The next level of detail contains 3- 
D model.' for each of these components. Fpr example, the aTm 3-D model consist* of A 
model axis, which coincides with the arm auxiliary axis In the human 3-D model, and two 
component axej ihat correspond to the upper and lower arms. The hierarchy ex lends in 
Similar fashion through 3-D models for the bwer arm, hand, and finger, and each step is 
illustrated in figure &. In this way, a 3-D model representation may be built to capture the 
geometry of a shape lo whatever level of detail is required. 

The Underlying Idea here is that in order to use the 3-D model 


ropFfl-ser ration, the largest unit that has lo be manipulated at any one time is small - a single 
3-D mtxlel — ■yet the representation of any whole shape may be elaborate. 

Thus the decomposition shown in figure E should be thought of not as 
the process of successively refining a single description, but instead as a representation 
system in which the balance between resolution and extent of description is flexible, and can 
change rapidly according td the nerds of the moment. For instance, one cannot examine the 
fine detail of a hand without TitsC reducing the scope oF the examination to just the hand T 
D model If the owner of the hand suddenly moves away, the focus of attention can quickly 
be shifted to a model nenr she top of the hierarchy in figure 6, since that is the level of 
description at which movements of the body as a whole are best described. 

We have found the trade-off between scope and detail to be a useful on? 
for the processes studied by our theory, because the information preserved at each level of 
the representation is just that needed by the processes that use this representation 10 
interpret an image. For example In the analysis of a projected human figure, the 
oriental ion of the torso relative to the viewer is computed using information about the 
orientations and lengths of ihe limbs relative to the torso as they are projected m the image. 
This is Just the information that is represented, by the human 3-D model. The sime holds 
true lower down, for 3 D models or smaller parti 

The imports™ overalls characteristics of the 3-D -model representations 
for shape are: (t) 'he description provided by each 3-D model is quite Simple While still 
possessing the shape information important to the processes that wilt us* the 3-D model; (£) 
this technique produces descriptions that irt canonical over variations that are nnc 
important in terms of recognition al least for the animal shapes examined here; arid {3} the 
fidelity of the shape represents tunns produced is easily improved, without changing existing 
3-D models, by simply adding more 3-D models to the dejenption to represent finer details. 

Thi Strutfure aj a %'b Model 

The important question for specifying Lhe form of a single 3-D model is 
the manner in which the relative dispositions of Its axes are specif red. There are three 
candldaie coordinate systems, viewer-centered, object-cemered and local 

The viewer-centered system is the one in which comparisons with the 
image have eventually to be made. The image, and hence the projected axes computed 
from it are forced by the laws of optics to he based on a spherical coordinate system 
centered on the viewer. The difficulty with this system ts chat the descriptions produced 
depend upon the orientation of the viewed Object relative to the viewer. For example a 
horse facing left produces an entirely different description from a horse facing right in the 
image- IMinsky's multiple view? repreimtalkw accepts this difficulty and attempts to deal 
with each distinct view as a separate problem. A system based on the 3-D model idea 
requires that the underlying representation be independent of the viewing angle. ‘This 
Allows us to Tejeci a vlewer-cerfomed coordinate syUem. 

An object-ccnlercdi coordinate system is one in which each axil thar 
occurs anywhrre in the 3-D model representation of an object be specified In a 
■Circumscribing frame of reference based, for example, bn lhe top-level irtafir axil of that 


object Such a system is a poor uric fur articulated shapes where axes are not rigidly 
connected. For example, if one move? an arm, one's fingers usuaHy move with it, If each 
finger axsl weri? rep relented solely by reference to the overall body axis, almost any 
movement of a high-level 3-D model In (he 3-D model representation would render obsolete 
all information below that level in the hierarchy. 

The natural choice is therefore to dimibute the eoordircaie system 
mating It focal to each 3 D model. The position of the finger axis Is specified relative TO 
the Hand h which In turn Is specified relative to the arm, and this, to the torso. Jn order to 
discover the position of the finger relative to the torso, these intermediate relations need to 
bE examined and interpreted. The crucial advantage of local 3-D Coordinate systems is (hal 
They preserve the modularity of the 3-D model representation, which in turn enhances itJ 
flexibility. Using this scheme, it is easy to represent an elephant wjih one leg replaced by an 
automobile tyre, given Tft models fot an elephant and a tyre. 

In order to specify ihe coordinate system for the 3-D model 
representation, it therefore suffices to describe how the Spatial dispositions of the axes in a 
single 3-D model are determined relative tc? its principal axis. Figure 7 illustrates how this is 
accomplished. The length and. orientation of an auxiliary axts Is specified In spherical 
coordinates ftncfirtafreri,, £trrffe, size) or (fl, r) where the principal axis Itself defines rhe 
unit vEctor f(X 0. |.0). The precise position nr the auxiliary is determined by specifying its 
origin as a triple in cylindrical coordinates -fern&e tmtfeddtr.gtltstance.. position} 
for ($. r. i) about the principal axis. Once again the axis Itself is (0, 0, 1.0). For both of 
these Specifications, the direction of the zero girdle-angle, $. has to be supplied in order to 
fix the angular rotation about the principal axis. The see 

’ffttcfinofifln. girdle, size, emteditirig-glTdtt, a utMrfXn £-dr.J M n«, position ) 
specifics the position nf one cylinder relative tt> another, and Lt is called an aApmct relation. 
Figure 7 shows the adjunct relation between the torso and left front leg 

■of a cow. Tbe kg stans at (-lOO*, 0.JT 0JJ, that is, at the front end of the torso, displaced 
away from the axis of the torso by the torso's radius and located slightly ventral to the left 
side From that point, the leg axis extends iri a ventral direction about ?/3 of Ihe torso's 

length fW . ISO*. fl.GG). Finally, the 'hscltness of the leg is much ks; that that of the torso 
The angles and lengths ihat occur in these relations are represented in a 
iystenr (hat specifies both 3 value and a tolerance {table I in the appendix). For example, it 
is pOSSibk to Stair "hat a particular axis (Lie Che leg of a -quadruped) is connected rather 
precisely at one eno of the torso, is approximately vertical with about a ten degree tolerance 
□til to the side fin girdle-angle), and a tolerance in Inclination of about TO degrees, which 
includes positions through which the leg normally swings. 

TM f magr-5 patt Processor 

We have seen how aruciural information about a shape is held by its 3- 
D model repteiencatiOh in a coordinate system that ]j essentially distributed. We also noticed 
(hat information from ihe image is expressed in a viewer-centered coordinate Frame. These 
two systems have to be relaLed, and the mechanism for accomplishing this is called the 
Image-space processor. 


Cl /%■ o 

S " = n J 


Ol J 

p» 

(u pq TI 


£ 

pj 

3 

Li- 

-^■ 

'.T 


1^ 
u 

c- OL 

a 


* < T* SL 

** I* 

5 3 

H 51 * 

od ^ fu 

^ EJ ^ 5 * 

" i ” s- - ■a. s- 


a 1 £f | ?s ra; 

yj w jT “i 3 "^S-s; 

S 2 O 3 3 S n ‘ 


J" 

TJ 

a w 


■I? 


A 

W “ n “M 

E | I I | J £ Jf 3 I 

| a 3 q f? C| S * 5 * I 3 

3 ii a,^E ^ s -g J 
_=■ . ^ - s. S 


3 "S i >- a 
— 5 ^ ^ 


v. 

+ 

cj 


3V a - 


■HE 


£ J *r * 

£ f 1 ° p. 
^?rE5.3-E*s- — 

3 cu » *n U w x ■■* 3 

!¥ S -c —“ pj -a ipq 

Z 3 ?! t 
^ -o > ■ 


H 

? y 
° ^ 
< a* 

if 

■»" 

?■§ 

— ! v* 


j. a s ^ ^ 

if- r I ?g 

= & 5 & I 1_ 5' J 

7-3° s^S» 

e =i ^ ^ g- □—„=-. i 2 

a.r&aUg^gjf" 


** ■» _i 

S£. U 


o 

* 


5' * 


S* zt 


J g, £ ft 

_ ^ ^ => “ IT 

ui ? <W Si 3 

S- 11 ,? * 2 s 


~ | * a 


s 


JS 

■c 


„ _ IF 

1 # S 11 ° 3 "tq ' 

s* 5 s ** £ S i a 

* * - J f ^ ^ 5 

sr [“ .C ™ _ 


$ = 

a s. 

3“ ^ 

a? 

□O EH 


a * 

3 “ 

£ * 

” 2. 

*» FT 


3 « rs 

2 * 5 


3^** 

l*^Us f 


CL O 


c " 

l i 3.s % 1 5 1; yi 

° S« ? S-nZ55t = 

s “ ?* m sp « ■ 

sir ? ^!f3j 

n ii w- i , 


Qj * 

i. ju 

■ x 

(U " 
3 ^ 
P- 


a. * 




=■ 3 


i £nfc 

nT 


Q’S: 


t * pig " 

“’ff C*M ™ g_ 


n 

Cl 


V 3 r* s 3 


3 ^ 


2 . 5 


A 

a 3 ' 

^ S 

*m 

■S » 

3 I 
^ , " i ‘ 
f* ”■ 


QJ 

fi a s s a ‘ 














$ I me our system Tot representing shape is based on >0 models, cath of 
which It simply a tel of axes OTgjniied aroanj & principle axis. the cnmpuUtLemal 
machinery needed in [he amage-tpaoe processor Li very simple. It can be thought of as a 
tabular cu simple arithmetic device that it able to maintain the representation of a 
distinguished vector, called the lnx^, in Ykwei-centflied spherical coordinates In addition, 
the image-spate processor can represent one movable vector called the tipa tift {for space- 
arrow). The Important point about the processor il that coordinates for the (spasar are 
available simultaneously in a franc centered on the viewer and Eh one centered cm the SaXiS, 
w> that specifying the Jspas&i in either frame makes it available in the other. 

The taxis essentially defines a local coordinate system. It is specified by 
its [wo endpoints, and by one Other point that defines the tero girdle-angle. The SSpasar k 
defined, by its two endpoints. Thus the image space pTOceSSOT takes five points specifying 
the tspasar and Sax is in the viewer centered system and produces ail adjunct relation 
specifying the disposition of the (spasar relative to the flank. The reverse transform, also 
computed by the imas;e space processor, takes a specification of the Sax is and a relaiion 
Specifying the (spasar relative 10 Ihe Jams, and produces ihe coordinates of the Sspasar's 
end points in the viewer centered system- Since the viewer-centered system is expressed in 
spherical coordinates (ff. ■£, A predicted projections on the image may he obtained by simply 
ignoring the radial component r. 

An example wiil help to clarify these points. Ef the orientation and 
location of the lasts relative to the viewer represents the torso axis of an imaginary horse 
and the appearance of Its neck axis es required, the appropriate adjunct relation, giving the 
disposition qF ihe neck axis relative to the !0r!0 axil. U read from the horse 3-D model and 
Ihe image Space processor is used to set the fapasar relative to the laxis ai indicated by this 
relation. This computation produces the coordinates of the fapasar and thus the horses 
neck axis in the viewer’s reference frame and iis projection i? obtained by omitting the 
radial components. 

Tn the simplest implementation of the image-space processor, Ihe taxis- is 
a passive element. Rotating it or translating it in the viewer's space-frame requires the use 
of the tspam to compute its new coordmaies. During recognition, t w o circumstances occur 
that cause one to move the ■Jaxis. Firstly, the orientation of a !-D model is adjusted 
incrementally relative to the viewer untLl a disposition is found where the predictions from 
the 3 D model agr-« best with those obtained from the Image. And secondly, when a piece 
Of i !D model es to be examined in finer detail, one of the appendages of the model at the 
current level of study will become the principal axis for a mote specialised model that deals 
with the fine Structure of a sub-part. When shifting downwards Ed Study the sub-part, the 
tax Is arte Its implied reference frame has to be moved to Ihe new principal axis For 
example, when using the 3-D model for the overall structure of a man. ihe laxis will be 
bound to the torso. In order to move to a model for 0 *»e of the arms, the tipasar must First 
be moved to thal arm, andi the laxis may ihen be transferred to the position computed by 
the Ispasar. 

The Caiaiagitt of }-£) Midtli 

The 3-D model representation of shape.has been defined, and We have 


seen pri rtciplp how [he image-space processor relates the specif (cation* found Jn a 3-D 

model representation to Those being delivered. from an image. The third major structure in 
the theory is a catalogue of stored 3-D mc-dels fsee figure &), from which individual 3-D 
model.* are freely selected and refined during the construction of the 3-D model 
representation for a given physical shape. Thf catalogue is indexed in various ways, 50 that 
incomplete Shape information obtained during the analysts of an image causes a particular 
3-D model to be selected, and this model, m turn, aids the further interpretation or the 
image by providing constraints on the possible dispositions of the antes found there. 

The 3-D model catalogue may be though! of as a vocabulary of shape 
descriptions, and part of the process oF recognition m our theory corresponds to the selection 
Of increasingly specific 3-D models at each level of the 3-D mode! representation (hat is 
being built for the current image NcriLee that mating a 3--D model representation more 
specific by substituting increasingly specialized 3-D models within it is distinct from 
augmenting It with extra detail by adding new 3-D models to its fringes Tn the first case, 
one might for example twitch From an overall 3-D model for a quadruped to one for a 
horse; and til the second, one might add to The existing representation a 3=£> model for a 
wart in the middle of one flank. 

The 3-D model catalogue is organlied in a hierarchy of increasing 
specificity. The topmost level contains the most undifferentiated description available, 
which is (he 3-D model for a single cylinder It Is ihe top-level description of every shape Jn 
the catalogue, For this paper, we restrict the caiatogue Co Those of a few animals, so at the 
next level of detail, there u a general quadruped shape, a primate shape, a bird-like shape, 
and various limbs These Schema are very general; fot example, the quadruped shape 
specifies only chat there are six appendages, with certain constraints on their positions and 
dispositions, but Wtth only a Very general specif icaTtort of The types of limbs involved. 

The 3-D model catalogue dors not respect the difference between 3-D 
models for an object and ns parts; its hierarchy simply traces lines of increasingly 
Specialized description Thus, 3-D models for the component part* of an object (legs, arms, 
ears, fingers, navels) are also arranged in the hierarchy of increasing specificity, while 
shS-t'iOg the same top-level description or a single cylinder. For example, the hierarchy for a 
hmb starts with the cylinder, next decomposes into two segments (like figure 3 c), and each 
segment has US own subdivisions In addition to Lhis, the "general' (i e undifferentiated) 
limb 3-D model differentiate? into forelimb and hindlLmb, these into hotse-forelimb, cow- 
fordimb, etc. Ai each level of specificity, a 3-D model has internal references 10 component 
sub-parts ■■ for example all limbs have upper and lower components — and of course the 
upper limb Component of a horse-foreleg model differs from Che upper component of a 
human-arm model. 

The extern of this repertoire of shapes affects the efficiency of the 
computations for describing shapES presented to the sySTem, but it does not limit one Co them. 
For example, if presented with a favourable view- uf a horse like that in figure H, a very 
limited system would be able CO construct the description;.of its shape without the aid of a 
quadruped model using only single cylinder models, but it would take mote lime than if the 
quadruped model Were available and used. On« The analysis uf the shape tn an image is 


Figure ft. The J-D model catalogue contains .1 repertoire of shapes organised from the 
general to "he specific. It is consulted several Hmes during the analysis of all Image. and 
with Us help a 3-tJ model represen litiOfl of rhe viewed shape is constructed. Al the top level 
is the most general model of all, a single cylinder. At the re?a level are models for general 
categories oF shape, thoie listed here ate for a quadruped, 9 primate, a bird and a limb. At 
the next level of differentiation, specific types of these general categories are represented 
The constraints imposed, by using a model at one level in the catalogue to interpret an 
image, often give sufficient nevi information to enable one to select correctly a mors 
specialised model. The orgsniiation exhibited in this figure Ls Orthogonal 10 the 
organisation depicted in figure 6. 


CYLINDER 



LIMB QUADRUPED PRiMATE BIRD 




































































accomplished,. ihe newly constructed 3D models can be assigned to the catalogue as new 
models to be died to help interpret JUbsequent images. Thi? Step involves a considerable 
amount of indexing. 

An important feature of the 3-D model catalogue is the extreme 
Flexibility with which individual 3-D models may be used during the construction oF a 3-D 
model representation fen a given image. This is of course essential during the proceis of 
r<COgn-nLOU, where the descriptions of the difrErcnt parts of an object evolve independently 
[Q a certain oltlimt. For example, one might a: a particular instaot b(t using a quadruped 
model,, with rather general associated leg, neck anti head models supporting the analysis. 
The constraints supplied by the head model allow a sufficient amount of new information 
to he obtained From the image so that Lhe newly specialized description can be used to access 
rho particular 3 D model feu a horse-head directly nu the catalogue's indexing mechanisms 
This then allows (be developing representation to be further specified both through 
improved specialization of the 3-D mode! selected for the whole animal's shape, and through 
Improved specialist ion oF the models for Other components of the shape sllch as the head 
and tegs. 


Ill; The processes of the theory 

We have seen how 3-D shapes are represented, ahd the mechanisms by 
which this representation is translated into quantities that may be measured from an image. 
We now turn to the mote dynamical aspects of the theory, and these fall into two parts. 
First, how does one select an appropriate 3-D model given only the 2-D Stick figure derived 
from an image? And second, having obtained a candidate 3-D model, how does its France of 
reference com* to be specified accurately relative to the viewer's? The basic strategy of oUt 
approach uses the principle of least commitment fMarr IS’S’bb), which states that nothing 
should be done that may tale: have to be undone. At each stage, action is based on 
information and constraints that are reasonably certain, and li designed to produce new 
information and fresh constraints that will help to guide the analysis towards the desired 
goal. 

This part of the theory is only outlined; in fact it lies almost outside the 
3-D representation module, Since information from many CKher modules and interactions 
with them play an unavoidable rule in rbe analysis of any but lhe simplest images. 

T ■'if fun? proMtmi 

I-- AtCtiiing c luJfafii'f J-fl Twrftl 

The first problem is how to obtain a suitable 3-D model The database 
contains * large store of them, and we have to use information from the Image to select one. 
The stored S-D models range in specificity team the very general to the very particular 
ffiom a single cylinder to a giraffe), so thal accessing the 3-D model database with a given 
set of features would nn general cause the indexer to return many possible models. The 
principal of least commitment implies thal erne Should never use a model that is more 
specific than current knowledge warrants, so it is inappropriate to index; very specific models 
under very general attributes. Hence the access paths in the database behave more like a 


a. 



b. 3-D MODEL SQUADRUPED 

Component Axes 

1 - tTORSO 

2 - $BUST 

1 - $LI MB 

k - $ LI MB 

5 - Slimb 

6 - Slime 

7 - Stail 


Ft^ure 9. The homology problems, Previogj visual p™; W S« deliver it datasrrxfcture like 
tTiaf exhibited an. (a), where each axis u associated with a cylinder width, and the 
connectivity is explicitly available. The first homology problem is to select a suitable 3-D 
model from the catalogue. The result of the computations earned out here 14 the assignment 
O! a ^EiQii>upf£i l-B iw>dH to this problem. Next, a homology must be eit.ibU.sbed (so fay as 
as possible) between tbe axes in the image and the component axes of the quadruped 3-D 
model The result of this step is shown in (b> Ac this point the vowing angle is stitl 
unspecified, and only raiher general information has bren u S pd to establish the homDlcwy 
w ith this unspecified 3-D model: 




decision tree rhsn they would if eveiy Item were Indexed Independently. Olile a general 
model like a quadruped has been retrieved and used 10 describe the it forint a IcxjI 

context Ihmugh which more spEclatljrd features of that model can acteii more ifwclaliied 3- 
D models indexed under it. 

Suppose that om* it presented with a stick-figure image like that in 
figure EX. To begin with, nothing it known a hoot the perspective from which the object is 
being viewed, so Lhc initial 3-D model must be selected, using information that it preserved 
by perspective transformations. Connectivity is not destroyed by perspective 
transformation not are quantities like the fractional distance down one axis at which 
another connects to n. unless the object is being viewed from very close by. Spurious 
connectivities can be introduced if one axil crosses in front of another and if the reason IS 
rot rccogniied lower down, but existing connections cannot be destroyed. only obscured. 
Hence in order 10 use connectivity informa lion, when- measuring which database items best 
match i given configuration set, unexplained errors of omission are treated much more 
seriously than Iinexplained error! of commission. 

The second sort of informal ion it girdle-angles, mdl nat tons, and the 
relative length! of ax«. ft is easier to Lake advantage of these later on, when the image- 
spate processor has delivered ?it least partial results about the three'dimensconal Orientation 
tela Live to the Viewer; but it is possible to do something with them early on This comes 
shout through weak, gross clues For example if the 2-D length of the neck significantly 
exceeds the apparent length of a the “torscT in <he image, and if the torso does not seem 
abnormally fcrshottened when compared wjfh the length of the "legs", Lhc image IS tikely to 
be a girarfe. In Other words, lower bounds oh the lengths of limbs can often be Inferred, 
and are sometimes useful Another important type of due concerns major difference! in ihe 
girdle-angles of two axes that are connected to a camraon one. For example, the neck and 
the tail often pome Ln very difffTeni direcuuns — one up and erne dnwn -- and ibis obvious 
difference can usually be seen Without a sophisticated 3-jJ analysis. In a pipe-cleaner 
animal, this very rough difference can help to determine which end of the animal IS which. 

The important point about the initial index access, and all subsequent 
accesses until an adequate description has been built. Is that the newly selected model is used 
to .structure information that U already available arid H instrumental in obtaining fUTther 
shape information from the image This ndded information is then Hied to select, a more 
specific model, and r hfl process repeats itself unlil enough information i! gathered for the 
purpose at hand. 

The path to a 3-D model is not always direct. When an Important Stick 
in the stick-figure is foreshcniened and component shapes are insufficient for determining 
the 3-D nr Tlel, other kinds of strategies ate needed. An interesting example is a water-pail 
(see figure J). When seen from the side, the image of a pail segments naturally into its 
generalized cylinder description in which the pail is represented as the slice of a cone and 
the axis is vertical (figure £e) If one took! down from above however, one essentially JW 
two circle! joined by the ilopillg Sides. The principal axis of the pall would appear 4! a 
point from this perspective (figur-e 2d), and if the pad's handle were missing W only vaguely 
defined in the image, there would be no strong component clues to work with-. 


In Otdei 10 access the wrretf 3-D model despite these Obfuscations, some 
idea of depth has to be introduced M-tO the analysis Jhf/ffr* addressing the-3-D model index 
CAJ1 be successful In the case of the pain, some piOMSS Fias to realise that the two circles 
might he -separated in depih, and that if they are, (hey could be separated by a considerable 
distance, The clues that .signal this ill monocular Images include radial symmetry and 
nuances of shadow and high light, which leads uj to expect Lhat much of the analysis of 
lighting and shadow can influence Ihe processing at exactly this stage of recognition. We 
think of the amputations that take place here as deploying the ftpasar to construct from 
the image a primary 3-D model, that consists at first of an axis in depth whose 
circumscribing surface is bounded by the two visible nicies, and to which extra details - Hke 
hollowness, the closure or one end of this surface by an orthogonal plane, and possibly the 
addition of a cross-strut to account for the handle - are added At some point during the 
construction of this description, the indexer is successful at finding a match with some near 
antecedent of the bucket 3-D model in the catalogue. ]f an "unconvenlloniH view" becomes a 
common view, it would become profitable to index the appropriate 3-D model under (he 
special features I bat obtain for I hat view. 

2. Af aUhing the image to a. medtl 

Once a 3-D model has been selected. Its component axes muse be paired 
with sticks in Lhc stick-figure image Since the ways in which a 3-D model is selected vary 
considerably, the association between these elements li not always automatic. Often, some of 
the association* will remain ambiguous For example, imagine the illhoueite of a horse 
from the side: the legs are easily identified but the left and tight forelegs cannot be 
distinguished without further information. What is important in many case* IS that a 
particular stick from Lhc image is one of the legs. Since thr legs are roughly parallel and it is 
their orientation rather than their specific identity that is important for Cumputmg the 
figure's shape. 

The information available for making these associations increases a; the 
processing proceeds Initially, positional information along the principal axis of The suck 
figure is depended upon most heavily Often, dues that are available at this stage include 
the relative thicknesses of the shapes round the stick axes fthe neck of a horse is much 
thicker than the legs), and the decompositions of component sticks {the tail arid legs of a 
hors* may be roughly straight, but the bust ha* two components that always make a large 
angle with one another). Symmetry OT repetition can atso be important for disambiguating 
the components or a stick figure. For example the legs of a horse ait all the same thicknes?, 
are roughly parallel, and because of this hive roughly the same length in the Stidc-figilre 
imxge, distinguishing (hem from the tail. Also the legs and tail are usually on one side of 
the torso while the bust extends to the other side In the ilKMge of a horse. Collectively, such 
dues are ofleh sufficient to disambiguate the major components of a 3-D model. 

Relaxation 

The fihal part of the theory assumes that Ihe image has been described 
by a 3-D model with which a homology has been established, and describes how the model 


figure IQL ftelaxmg a model onto the stifle figure derived ftem the image. Once a 3- 

D model has been selected and associations have teen mad* between the axes of the model 
and the stick* computed from the image, the approximate oriental Lon of the model relative 
to the viewer is computed uia a hill climbing algorithm using the image space pnKQUOr- 
This process ts carried OUT with the SflXis positioned SO that it* projection coincides with the 
Stick associated wiih the models principal HXlS (as indicated by the double-headed arrows 
above^, With this arrangement, the appropriateness or a proposed principal axis orientation 
can be judged by using the Uspasar to compare the consequent projection* of the model’s 
limbi with the associated Sticks In the image. The laxll can be rotated in two dimensions 
without moving its projection away from its assigned stick in the image. It can be dipped 
toward or away from the viewer and K can he rotated about ill own axis, In the figure* 
above, dark lines indicate Sticks computed from the image and light tints art projections 
computed using ihe flips tar. The top sequence (a), (b), (0 shows the projected axis of the 
quadruped model For different rotations about the taxis while it* ends are equidistant From 
the viewer. In the In war sequence (d), fe>. (F) the tail end of the faxfs is moved slightly 
farther away from the viewer 


iZV- 3 .$ 3 ija £[- rtB- ,0 ,-3 „21 

























































Figure II. Views of an object in which an important axis is foreshortened arc surprisingly 
common. Fiom only one of these views of a camera (b), may its two main axes be recovered 
Straightforwardly from the image. Figures (d) through (f) show how this happens, by 
displaying the as« for each of the views (a) - (c) within a line drawing of the overall shape. 
Views (a) and (e) fall into the same class as the top view of a. water-pail (figure 2b) 
According to the theory, the class of such views provides a rigorous definition of the 
Intuitive notion of an "unconventional" view (Warrington k Taylor 197$}. 
























comei to possess lhe appropriate 3-D orientation relative to the viewer This jj acewnpftshetf 
by an incremental hill-cTimbing procedure which uses (he Image-Space processor and 
information in the 3-D model Lo match the model to the axes derived From the image. 

The basic idea here n to use the image-space processor to compute the 
discrepancy herwren a given 3-D model orientalior> and the constraints imposed by the sticK- 
figure image. The Basis is set to .an arbitrary initial orientation (slightly approaching or 
recoding from the viewer, based perhaps on shading cues) so that 11s projection ts parallel to 
the axis of the Stick-figura image. Two degrees of freedom are left unoonstrained at this 
penm, the dip of the fasts out of the image plane towards or away from the Viewer, and the 
unit vector associated wnh the faxis winch determines the nutation about the Sax is of the 
objects local coordinate system {see figure IQ). From a given disposition. the discrepancy 
between the 3-D tviCitJerj projected campon one axes and the corresponding sticks of (he 
image cart be eompULcd using the image-spate processor, and Iheir sum gives an indication 
of the goodness oF fit of fhis particular orientation of the Eaxis A simple incremental hilt- 
ctimbing technique may now be used, that varies the dip of Lhe lax is and the rotation about 
it until Ft suitably good fit is found. Further discussion of the process illustrated Jn figure ID 
may be found in the append ix. 

This technique is incomplete aj j| standi, Since the orthogonal projection 
of a stick figure looks the same regardless of whether its head is nearer the viewer th^n its 
tail For animals like a horse, this ambiguity may be resolved by noticing whether the 
forelegs or thF hindlegs jure shorter For less familiar objects, Obstttraiion or context clues 
(what the object is on ot In) arc probably necessary to disambiguate the two possibilities. 

Finally, comparison with the angles of the image are only a partial source 
uF error information in lhe hill-climbing computation. Used alone, they would make the 
computed disposition of rhe lax is 'CIO sensitive to slight variations ih the dispositions of the 
component axes in the image. We therefore include in the error calculation discrepancies 
between the dip of the Espasar away from the viewer and the dip computed from the image 
using perspective information (does the circumscribing cylinder thicken at the nearer end as 
It should?), and length information (for this orientation of the Sspasar is its projection too 
long u: too short compared with the image?), Our grasp of this part of the theory is 
adequate only for Simple images, and we shall develop il further etsewher?- 

IV: Discussion 

The discussion falls naturally into two parts, one concerned specifically 
with Vision, and the other with Lhe organization of information in a wider sense. 

1- } D TtprfttntaiiAft ifteojy 

There are five main (Minis to our theory. They are: 

(1) The 3-D dispnjjtion of sr> object is represented primarily by a StLck-figUrc configuration, 
where each stick standi for One or more axes in the object’s geueraliccd cone represtntatier. 
(?) This con figuration i* described by a loosely hierarchical aSSerlfconal database, called a 3- 
D model representation. Use of this database IS extremely free and flexible, and it can 
support levels of description that covet the spcCLrum from very coarse to very fine detail: Ti 
also satisfies the principle of graceful degradation, which states lhal partial information 


should yield partial results. 

(J) !n order ta be useful, Ehl* database has r -0 be interpreted through an (essentia Illy }■ 
analogue mechanism, culled the image-space processor, in its minimal Implementation, this 
processor can be thought o F as maintaining the representation of on* vector Mi a local jpAee- 
frame 

(4) The image-space processor's injunction set is small. Its most important features arc. 

<a> the ability to interpret an adjunct relation between the fasts and the ffspasar; and 
(b) the ability to relate object-centered coordinates to a viewer-centered frame of 
reference. 

(.5) The image-space processor can deliver information about the lengths and Orientations of 
the appearance of the (axis and Ispa^ir. These help the system to 'rotate" Its model into 
the correct 3 D disposition relative to the viewer. 

The immediate and most accessible prediction that follows from the 
theory concerns the characterisation of Warrington & Taylor's (l^S-) *iJtiCOnwenliOrpar view* 
According re our theory, the most difficult viewi to handle are those in which art important 
axis is foreshortened, Since in these eases straightforward segmentation fails to recover them 
from the image. We therefore predict that these j re the views that Warrington Sr Taylor 
would label unconvenliOfiAI, and on which their patients will fail most easily. Such views 
are by no means uncommon, and figures 2 and II COHtlln lw& familiar examples. 

It is hard but not impossible to derive derailed neurophysiological 
predictions from the theory, particularly predictions about the likely impkrnemcatlon of the 
image-space processor (Nishihara, in preparation). There are however several general 
points about the theory chat lead US to take it seriously as a model for psychology, and 
which therefore encourage us to derive more detailed predictions. They are: 

{]) ?tpe*c1eaner animals are almost as easily recognizable as art line-drawings of animals, 
despite their very abstract relation to the original- Thu would not be surprising if pspe- 
cleaner animal* were trt some sense extracted from the image during Ihe normal course of its 
interpretation (as our theory asserts), but it would be surprising if not. 

(2) The loosely hierarchical structure of nur 3 D models has many compurational advantages 
that are almost bound to be shared by the psychological representation, even if the 
psychological representation U Otherwise veiy different. The advantages include a variable 
level of detail in Ihe 3 D model system, and the flexibility with which different 3-D models 
may be accessed and combined CO form new models. If a system has 3-D models for a hoTsc 
And for a man, it Will be able Co build the description of a centaur 

(3) An important part or the theory is the simplicity of the image-spare processor. The only 
requirements are that it be able tq manipulate one vector in a space-frame, and relate the 
specification In that frame to one in (he viewer-centered frame. By using the stiek-fIgwre 
representation, the essentials of the spatial organization of a shape may be manipulated at 
very tow com putai tonal cost. 

(4) The mechanisms of the theory can handle 3 E1 shapes, and so are inherently powerful 
enough to describe 3 D partem*, such is the ronfiguiation of features on a face The only 
requirement is that Such patterns should he described relative in axes that are const rutted 
wlih mi them, since ihe structure of a 3-D model depends on specifying positions in Thu way. 


^ ^ TO 

=• a c 

^ -i 


3 it 


Hi 

o« in 


W *Tl 
*■ C 

B » 


'■» e 

a> 3 

1 3 

-d h 

E- c 

V 1 Iri 

o a 

*? s 

O 


Cl 

Id 

3 

o 


" H 

& =r 


■i. 73 


_ 3 


z> 

"g 3- p 

I 3 ^ 

"1 _- _ 


H 

3“ 

*fc 

11 

=r * 


B * ™ 

B "S 

pis' 
- ° g 


(w i S. 
c ^ a 
Cl »o 

a d 


pi 

K 


§ :. 


e » * 

Eu 

b r-5 

d- 

*3 
a 5 


■IT 

--*i tii 
- =- -V 

= 3 S 
< |Z 


TJ « 

5- 3 

3 i> 

P* 

- s: 

i % 

o 3 

3 

3 ^ 

■Tt 

„ W 

3- TJ 

* s 

L -- 1 




2S S k 


3 

*• 

c —" 

m c 


’"I 

■tt 

r ! —■* 
W riX 

E. <?o 

Jg V 

3 

3 £ 

3 * 

■an 

O 

3 

Q S 
3 3 

pi a 

_- 


w fT 

i* 5 1 

13 

CT 

rf" 

n ■■ 

° M 

■g * 
S — - 

G 

i*F 

fl 


-* 3 
A "1 

PJ 

M- 

a 3 

■* E" 

— 

‘T- 

n P» 

&> n 

a 

c 

IT 

?j 

1 ?, 

U 

»» 

0 

0 

H 2 

3 O' 

K 

s 

4 " 

=1 

ih 

Z S 

v* "! 

5 " w 














It li therefore important for the Iheory thin axel be established early in Out perception of £- 
D Figure* Figure 12 provides positive evidence on thu point In the top row, th.e shapes 
are seen as squares, whereas along the diagonal, they aTf seen, as diamonds. The diagonal 
axis is therefore being constructed during the analysis of this pattern; it influences, and 
therefore probably precedes, the description oF the shapes of the local element* 

(5) The theory has been implemented and work* well for Simple images (see the appendix). 

Af ertfoF rc.'crmn rKprrfstcirfJ 

Tn ]97(, Shepard fr Menler {1971} created a set of images by rotating and 
reflecting simple objects made of cubes (figure 13). They found that the time taken to 
decide whether [wo such image* were of JtteniLcal objects, rather than objects that differed 
by a reflexion, varied linearly with the angle through which one object must be rotated in 3- 
Spadc to become aligned with the other. This finding revived interest in "mental imagery" 
and in analngue processes in perception (Cooper & Shepard (L973), Metiler fi: Shepard (1974). 
Shepard (1975)) In addition, Kosstyn (1975) has published evidence for an analogue 
component to the processes that interpret mainly two-dimensional Structures, like fates and 
maps. 

The Significance of Such experiments is contrn versial (but not the 
results) Part of the reason for the controversy seems fo have been some difficulty in seeing 
how an "analogue" process could benefit the computations that underlie perception and 
recog n it ton ■ We believ? that the present theory shows a wxy in which such a mechanism 
could he useful. It asserts that there is indeed an analogue component to the process, namely 
the image-space processor, and that Jl operates on the stick* in a 3-D model. The Imparity 
that Shepard rt til regard as significant is however not a defp consequence of C«ilt theory, 
merely Lhe signature d 5 one particularly Simple way of implementing it, In the language of 
Marr Sr Poggio (197Gb), the linearity Is a consequence more of the mechanism* that are used 
fhan of the underlying nature of lhe compilation. 

Broadly speaking, if our theory j* taken as a psychological model. It 
predicts three stages in the aJSlgnment of 3-D orientation *£o views that are not 
unconventional The stages are. (a) A startup period,, during which the axes are obtained 
from the image, the 3-P model database IS accessed., and the two homology problems are 
solved (b) An incremental process, during which the stored 3-D model is relaxed onto the 
axes being delivered from the image L His process uses the principal axi* together wilh the 
two or rhree orher most suitable ones, and in its simplest incremental irnplemenuMion the 
time for relaxation will vary roughly linearly with the 3-D angle through which the stored 
model's space-frame is rotated, (c) Finally, when the best 3-D orientation has been found, 
the remaining axes in the model are bound to the Image, and fine adjustments, made to 
their positions and sues 

The same computational theory certainly ha* other equally viable 
implementations that do not exhibit a lineur dependence on lhe angle. In ore of these 
implementations, the Angle through which lhe model’s frame is rotated at each increment is 
half the angle between its present position and the currently predicted Final state. In ibts 
implementation, lhe lime to seLlle would vary with approximately the logarithm of the 3 D 
angle. Such a system does not have SO starkly simple an image-space processor as the linear 


one. bur its requirements are Still modes! relative 10 whflt a digital electronic computer cam 
provide, h must also be borne in mind that unless ihe subject it very familiar with the 
object^ being recognized, the interaction be: ween the image, the image-space processor, Arid 
the 3-D model database may be extended and complex. In such cases, any linear dependence 
on angle could be masked, completely by the processor accessing successively more detailed 
3-D models. This is parLiculaily true if tie subyect Is presented with an unconventional 
View of an unusual or Unfamiliar object, an expectation that suggests several experiments. 

]f one beais this caveat in mind, however, only one of the findings 
reviewed by Shepard {1975, item H page S{B) is Lines peeled. Et tomes from Cooper fc 
Shepard (l9T3b condition O). who showed that advance information giving the orientation 
but not the identity of the object to be presented Is not sufficient to enable subjects LO 
prepare for it. One might have expected that subjects could rotate their fa* is to the 
appropriate orientation, and leave it there to be bound to the principal axis of a 3-D model 
when the image was presented. In order to incorporate this finding, we would need to 
assume {for example) lhaf the mage space processor cannot he run unless bound to a 3-D 
model {even if only of an arrow), and chat whenever the taxis is abound to a radically new 
3-D model, the image-space processor is reset. There arc some other grounds for wanting 
this. The space-frame in the image-space processor needs more than one direction to define 
it, and crying to construct a space-frame round a given vector can lead Co problems if the 3- 
D model is not simple. Secondly, in the real world, one rarely sees two objects at the same 
point in the field of view Therefore, to change to a new 3-D model almost always requires 
a change in the direction of gaze. In order to compensate for this in a minimal 
Implementation, the taxis and fspasar would have to be set to axes in the starting frame, in 
Order lO carry toil ihe primary rotations, that allow for the angle of gaie. These arguments 
are however weaker chan the arguments that support the rest of the theory 

Before we leave Che discussion of the Visual aspects of (he theory, it 15 
appropriate to note that the 3-D model representation is not without its disad vantages. 
Firstly, it is based nn Min structural axes of a shape, and some attempt at extracting therm 
must be made before ibe mechanisms of the theory can he invoked. To do so requires a 
great deal of pre-piucessing of the image, and (he theory associated With this is only 
beginning to be worked nut (see Harr &- Foggic l97Eb for a brief review). For VicWi m 
which a structural axis is foreshortened, this pre-processing may he completely unable tD 
deliver the correct axes. On such views, a system that operates according Co ihe present 
overall theory will oe severely disadvantaged. It is not dear whether other methods csist 
that wcrtiid be mbTe successful 

Finally, the critic ism abour the absence of uniqueness, that we made of 
Eaumgaf's system, for the representation, of shapes by polyhedral approximation, sometimes 
applies lo the generalized cone representation. For example, consider a doorway. The 
natural axis oF most doors is vertical, because they are higher than they are wide. This Is 
not always true, however, and it is perfectly possibly to represent a doorway by an axis 
parallel to the width of the door, or even one parallel to its thickness. For most purposes, 
there is little difference between usmg the height and using the width as thE principal axis, 
but using the thickness may introduce an important new way of looking at the space the 


rfCKJi occupies,. since when arranged in this direction, the Sspasar carries information about 
the direction that IS involved tn passing through- it. Tn other words, the analysis and Uie of 
holes may depend to a considerable eaten! art using the Jipasar to define what "through" a 
hole meins. Moreover, we feel chAt many or the problems oF representing and manipulating 
the space immediately around the viewer can be handled conveniently and. efficiently using 
1 mechanism like the image-space processor. 

2: jSrwrrffT iutrei cmitming tht rtpttstntoUm of hfwwM^t 

Following the tradition oF KarMelt (t&32|, Minsky (197b) observed that the 
"chunks'' of reasoning, language, memory and perception ought to be larger and more 
structured than most theories in artificial intelligence and psychology allow. This idea it 
much more attractive than il it easy to realise, and two factors can he identified as mainly 
responsible for the difficulty. The first is what are the chunks? To answer it. one must 
know how to represent a piece of knowledge for the purpose at hand, and much wort In 
artificial intelligence is devoted to asking this question in different domains. Sometimes it it 
answered wuh conspicuous success (Moses W MACSYMA). Shortliff* (B7B MYClN), 
Duffield rt ai. (1969. DEMDRAU Sussomn ft 5ttliman (1975 EL)). 

The second factor IS ihe question of flexibility. If all one's knowledge 
resides in canned chunks, little row* remains for variations In a scenario that are inevitable 
in each or its real world instances. This factor causes particular difficulties en domains that 
are ambitiously near to real-world situations, like Schank’s (1975) restaurant scenario. Its 
effect is to leave Ihese- scenarios- unable to deal with irregularities. 

In the present theory, we propose that the central description of shape is 
bawd on the 3-D model representation. The desired Flexibility is achieved by modularity 
within the representation, which allows. 3-D models TO he combined as the image dictates, 
and by using the 3-D model catalogue more as an aid to building die cUTrent description 
than as a set of inviolate subunits (hat must be assembled unchanged in a rigid way. 

TheothcT point that we believe may be important about the theory cs the 
way it embodies Minsky’s assertion, Lhat the overall Structure of a situation or shape it ol 
impOTtance to the way its details are recognised and their organization represented, The key 
idea here is (tie use of coarw overall descriptions of a shape to help extract new information 
from Ihe image, which in turn enables the 3-D models involved In its ■description to be 
speciatiied further so that yei more can be read from the image. Thus. 3-D models for the 
overall structure of a shape iel Up a context of SpHtlal constraints, between otherwise 
unrelated ases in ihe image, which then allow specific local "deductions" U> refine the details 
-- possibly causing the overall description to be abandoned This process is directly 
analogous to the situation in Sussman & Stallman's (1975) program for understanding 
electronic circuits, where a "high-level" description like "voltage-divider" becomes attachedi 10 
part of a Circuit, relating components by local laws Lhal are Special and Informative, and 
which aEluw tons: r aims on ihe behaviour of that part to be stated a ecu rarely and concisely. 
In these two domains, there phenomena seem to capture the essence oF what makes Minsky’s 
(1975) article so stimulating, although we feel that the interplay between different levels of 
description, which forms a crucial part of the computation, has yet to receive a satisfactory 


gen era Hot m Illation. [n any caw, Che Important feature of Ihese two examples ii that they 
ipKify precisely the information contained in the high-level descriptions. Discussions that 
consider only possible implementation HMcAatiijinj {frames, Semantic networks, property litis, 
Conniver methods, actors etc.) are not useful for deciding how information should be 
represented ili 4 fresh domain. 

The explicit nature of these high-level organising itrudures (the 
quadruped, Lhe voEtage-divider) stands in sharp contrast to methods based an cooperative 
phenomena, like the iterwpsls theory of Marr fr Poggio (1576a). in which the higher-lewd 
“l^olLSUc■ , organ sting structure of the computation remains an implicit, not an explicit, aspect 
of the network by which it is implemented 

There may he an interesting connection between the specific database 
organization that is required by our theory, and a recent study of human semantic memory. 
The organisation that makes it possible to carry m t the construct Lon of a gradually more 
specific 3’D model representation is the ordering of the 3-D modal catalogue hy increasingly 
specific shape. Thus the access sequence for 3-D models during the reccgnition of (say) a 
mallard-shape would often be approximately: 

imxH-blob-shipe -* -■> dutk-tfiaps > maUord-stope 

ThfJ provides an interesting functional basis far structuring The 3-D model catalogue 
according to rules very Similar to those eshibited by Warrington (1975). in a recent and 
ingenious study of ibr structure of semantic memory. 

Finally, we feet that a Simple mechanism along the lines of the image- 
space processor would he of great benefit lo a motor control system. At Some- level, a motor 
system must have access to a representation or body-space in which distance?,, directions and 
trajectories are computed and stored in a form closely related to what visual information can 
provide. Yet to execute a motor action, the commands must eventually b*> couched In terms 
Of lengths, tensions and joint angles A mechanism along the lines Of the image-space 
processor could provide a link between the two, at low computational cost. 


Figu re 14. This figure exhibits the mformalion contained in 3-D models currently listed In 
the restricted 3-D model catalogue uted bp our present Implenwnfitlen, Each 3-D model 
referenced by its Jname, has an associated width and a list of relations among Its component 
a lies, This list of relations specifies the relative spatial dispositions of the component^ and 
Indicates a 3-D model for each one. The accompanying sticE figures show the appearance 
of these components relative to one another from a particular Yardage pOLHt 


o 

■f^ 

■31B 

CP 

CL 

<T 

tft 

? 

T 

_l 

■"b 

lW 

3 

£ 

I 

re 

re 

3 

3" 

re 

“□ 

b> 

“1 

D 



fts 

i 

P- 

s 

Q 


s 

□ 

S' 

<3- 

fl. 

ft* 

3 

V 

3 


5 

<3 

2 

C 

G 

f? 

n- 

§ 

3 

3 


j* 

ST 


&■ - 


Tl 


5 ^ 

3 


■ ^ £ 
1 -a “ 


CL q S' * -j -> 

ST 3 ■* ■ 3 _T 


—*b — la ___ — 

vs ;? p n 3 $ 

£ T- tt 1 „ 15 

S g 3 ^ ’■" 


*T3 a 

D 


- - n. 3 

a | S S?-B 3 

— —» rb 


v> JF a ? I 


^ -Q 
— *> 


w ur 

% -* S' 3 rt ° ^ 

]aH- = 5 


_ * § I 

“ ^ 3 * 


$T ‘ ~l^ 

^as- 

_ pi =f -■ u 

?■ -o 2" --■-- S' 

? j-r 3 £ 3 ra 

1 SL 3* '"* J "' S 

I ^ '“ H 7 » “ 2- 

r <7-o 3- * s. 2 
n ss.: s 1 £ 

2 § 3 ‘ 3 

= £" I *.g , s 

!' ^ ff'E >"| 

g ! a _. _ rt. 


i 


er « S 

3 — 3 


■ in 


^ J s; £. 

5 s 

a 3 5; j 

^ g ^ ^ a 

3 a ^ =T s 

S S ft n S' 

'■■ a (j j- ■“ 


pj 


. M 

C ^ 


s s' 


3 

1*1 

ft 


8 5 o 

Vi 













































fCVL [iit'Efl 
H«l II ■ Hi M 


(OWflFUrtD 

L| IDTWI N 
KLATIOTSi 

IKIsMOMHJ in*£0 FOG H GIRO » IKl H M3 H EMM 5 FIZFI Nl 
IITDPSQ (IN FT Ftft 5 dlffi Im IKL t FfflF H MB S SIZE El 

nrcmo (Lire FfE n CEHO S lh£L u FUBT e eihd f. SIZE m 

11 fCfi&D *Lllfi FTE N dim S INOL u FITt U tnJD £ SIZE M 

ii u-ij_ u:r* -E s :::-»j s XL « fufc f fh-u t s.ZE i»! 

■ t-OHSC IlIH 0 PC1F s GIHEl S IMCL u EFFG, U FFELI £ SIZE N> 

(tlUNSa tTlilL PDF H Elffll S liHCL U ETfT N FIEL1 S SIZE El 


IPfllHHTF 
yiDT«l W 

1*>flinAT€ STCPSn POS y GIRO M IFCL N 6r*Jl h EHEO 5 SIZE Nl 
HTDfSO »iOD PDF 5 dim N I MIL hi EHT, N FFilO S SIZE El 
itTi-psa Hire P>1F 5 dim u 3MX u ew y Free n hi it n> 

IITUfsa ILine P05 5 Rlflll E JNCl u Eft* E Fren N SJZE N3 

l(TQR<tt tt-iru ftK M GIRD N ]HO- S EMK u EURO N SIZE N.I 

HTGfiK) U IFti FIB M GIRO H INO- 5 ETEC E EJ1M N SIZE Nl 


iblAD 

UI DIM: N 
BCLATimSl 

ItSIFU (TORSO FOS U GIRO 
IlTdFfFd IELS! P05 S dim 
itTCRga iLLrts ms y Gian 
iITCR-^J tLLrtd ms H GIRO 
(isCHFO tlAlL ROS y &|BO 


N LMCL N PUTT. N FrED 5 SIZE N> 

A| 3 NCI. y EHG N FFFn 5 SIZE £] 

5 ]NdL m oik u Emo e size £i 

5 1 KL y EHK E Effid E SIZE E] 

5 IHO. S M 3 N COM 5 SIZE EJ 


tTGRSd 
ytOTHr h 


l&JSf 

LI'I DTN i N 

HtL '(0J5T WF"K PE& N GIRD hi IhlTI, h F«3 H E*t&D 5 SIZE Nl 
I (MEEK (H£AC Fra 5 dim S IICL y EFRC % EnM E SIZE El 


(TaIIl 
UlOEHh E 


■ l l"-H 

UIDlHi £ 

RFLAT i I’fiJ j i 

HLI"B (ilPnfR-LHIE RH N GIRO N IhT-l h Erec N ErtsO S SIZE EH 
HJFPtfl-tlH ILElfH-Lire ms S GIRO u IMH N Erec N Efeo £ SIZE **!■ 


ttFmR;-L I rfe 
uli"rH: F 


(LOyCR-LIHFi 
MIOTiti F 
RELATIONS; 

ISy£]HSH.|Hf. iFnff¥L™ POS N GIRO h IhGL » N U%B 
I1FU* IPO iP'jJ Pdn F GSFfJ N tNCL n Fret h tmU £ SIZE 


5 SIZE Jfl 
Ei 


IPAH 

IlLDTrti Hu 

n upau ihalh pqs rn G3fO r« [Hi mt fpsC w EiUD un FJfE EnJ 
isfalh (riMSER phs as GiMD hi ihl bj epeQ hj EHO hi size en! 

UPAL.i ir|M-/-F »B3 S3 STO MM EMfJ. I* EPEU FU EMM ** SIZE W) 

■•IP* " IPIMSFF FG5 HE ulRQ *P [Hi hW EIEL- W 6o90 Eh SUE w 

|P]JF3-;n PGF FS- GIRO IW- hw Free EE Erfia EN SIZF mi 

-;iFA:.n HP J <tuEP FOG 55 FIRE Wi l«JL MM EHC EE hH SIZE mi 







IHORSE 

UIDTH! m 
RELIOLUHSi 

iTCftsg FOR HH CiSPO (*I IHCL i* SM=B W EnBD GS SIZE "Ml 

urofisa idost ros 55 o™ r* s mil nm hi ffrd ee -size ehi 

ntCRSfr lLtrB PflS HH CHfl) 35 LFCL UU EMC EE EW5 EE 5I2E HHh 

mORSC tUSS PCS Kh Gina ss fHCL LL FI-HU M, EW) E£ 51 IE Hhll 

ttTdnsO ILitn PT5 £6 UlPC 55 1MO. MU EIFF EE EF¥G £E GIIE HN) 

ItTOHGO K I PR FEE SS CLP® SE INO. LLI PFC LU EIHl EE SUE 1*11 

IfTWtSO tlAlL FDS m 6LPO SS (*L U5 EFBC t*i tnsj 5& SIZE EN1 



ij-ru 

UlOfHi HU 

F *"* lir-tu ircftso ^5 hr era hh (MCL m EW hu fifu SS SIZE Nhi 
«!• TORSO H?JST PCS 55 niRn MM ]MtL Kl dtJC- 1* FIFO EE SIZE GEL 

rtlCWSD ILF'S PCS hk GMT" SS 1IWL MU ELCG EF EW EE SIZE EN> 

itT'Jflgn EL I “H PCS Ml GTCG G5 1MCL LU EfCG LLI EW1 EE 5LZL kill 

ItTjnso ElIPR POS &S GIRt 5G IK* LW EmG EE DOD EE 55EE ENJ 

IIISPGO I IFF FES SS GIFD GG I HQ. LU FfWJ LU Eoaa EE SIZE FNI 

IH03SO (TAIL FtG, I* OLI® GS MCl «5 Free m two GS SIZE SCI 



HilRfiPFE 
LltQ'fH: hU 

^ A ' i ITSlRnFFE |T]R5C FOR Iffi Li I HD rfj [MCL IvW REFT. I* EHS^ SS SIZE HH! 
(nasso iyjsr pcs eg gird hh inCl m Ew, i* etbu EE Size mi 

CHtlHSQ P. I PC PC? KM GIRL RR IKL LJU FW. EE E«3J tE '3S PC mi 

intifrSO i Ira pit, iu giro re IHCl jJ SWi un >132 kl SIZE mi 

IITCRRO nice. POS ss CfRO SE IHCL IH ELK EE WBO EE SUE mi 

itTCRGO 11.1 IF POS SS C;fO 55 Im:l E-tDS iN EW EF SIZE NJI 

UrOftSS 1TALL POS >MH CISC SS IHCl M5 EHHfi W [PCD SS SUE Ettl 



1HJT.AN 

UIQTH: Elf 
RELATirTJS: 

ItHur^M 

ItTCREfl 

itTCRSO 

UTCRSfl 

UTCFSO 

11 n>*sa 


ItllHSO POS LU C.IRD Ml IHCL m EFOG f*l FHRO SIZE EMI 
•HKW EOS SS GIRO EG IHCL MM ETtt W CMHJ1 SB SiJs. hb> 

I, KH FW ss GIRO ML I HO. OS to* ou nan hN SIZE tM I 

sure Pffi SS GIRO FF Ihc. Ha »i« HE l'KIP m size eni 

■Lire fos mn CtPO m ieh ss mu JJ Erne m sirf hhi 

*LMF BOS MN OlfO W IKL 55 EHM EG EPfiD m SIZE MS 1 



\ 


iriritEV 

HJftTH; EM 

cjhchiev tTtfrso ms H- gibs fn inn. w EMC mu Plan ss sTjfc wr 

'■flOflW tl^M THIS SG GIRD HH TMC1. Ffl ET¥C rrl CrIDII 5E. SCI- EEJ 

ItTOflSO HlIHB PC? ss GIBE EH IHE7. U5 FIT® m £tDC m &3Zr WH 

(tTMSO i Itlfl FGE- 55 GIRO EE I HE. US Erflf; SE CrtiD MM SIZE 1*1 

ItTQHSO I irfi ROG fJJ 51FD I* IHCl SS FH0G M EtFD h* SIZE CM I 

ItTQFSa IlI* POS MN CPU I* IKL Si l"I'. ~ FFflD I* SIZE BHI 


/ \ 

li 


BEinirH 


WCOTHf KM 
nD-AT]gM5 E 

!tR|P;i iruPSQ POS *i GIHC ** 
rirmsd 160ST n t )5 gg cihr hh 


I MCL Ml EW. UN EIFQ 
IMtL LU CW rn eiffj 


J'CRSD IL1M9 P05 MM RrlRO SR IMtL LLI ErliC. LU ETBQ 

flTCFSD (Li‘HI PCG- LJU 01 PD ER IHOL uU GrlK EE EMSH 

itTP in *T*IL FOi m GIRO 55 IHCl SS O10G m nCC 


HR 

HR 

r.E 

rL 

55 


SIZE MM| 
SIZE HMJ 
51 Z£ hWJ 
SIZE Ml) 
SIZE EEi 



*navE 

UlDTHi em 

qELd,r 'liD|nO |T0»5D PR6 Ul SIBD HH iMfJ, »J FPfili HH EFGD SS SIZE MWI 

liTtPSO tBDGT FC16 55 tim HH IMCL m EW I* £«0 SS SIZE CW 

IITGf 5 n Hire. POS W GIRC 55 IMtL 1*1 EOBG LU EW3 EC SIZE tM 

i*rOR50 SLirB PCS LU GIRD 55 IKS. L* 0® EE EW3 ES 51IC ENI 

HTDP9Q ULIL FTE r* GIRO 55 IKL 35 EmO r* ETSH 35 51ZC HNI 














Appendix: an implementation of the theory 

In an Ideal situation, a theory of visual information processing would 
consist entirely of well-defined, circumscribed result^ accompanied by proofs of existence 
and uniqueness, with enough bacVgnound to show that the results obtained are in fact those 
that arc important for visual information processing. Marr (IQVSa) labelled such theories 
Type l. If thts were always the case, there would he considerable interest in. but no serious 
need for implementing the theory on a computer. In the present sr&te of the art one is rarely 
so fortunate. slnce.-even when an individual module can be given a Type i theory, the 
interactions between it and other modules cannot be satisfactorily analyled until the other 
modules have themselves been graced with Type 1 theories. This is to some extent the 
situation here- The core of the present theory is or Type I " The 3-D representation is well- 
defmod, and the image-space processor 1 1 precisely formulated — but in analyzing the 
interactions between the 3-D module itself and other visual or non-vlstial processes chat pass 
it clues, many different kinds oT information have to be taken into account. 

It is therefore important lo implement a theory such as this, arnct writing 
JIS implementation has proved an important technique for clarifying our ideas and. testing 
different approaches to carrying out a process. For ex ample, algorithms for accessing the 3- 
t> model catalogue are peripheral to the 3-D representation theory, but they are essential To a 
program that implements it, In our present implementation these algorithms are -quite 
primitive, because the main Focus of our attention has been on the image space processor 
and on relaxation mechanisms, We have postponed the development of a more 
psychological candidate for the Catalogue indexing mechanisms Until the important access 
paths into the catalogue are mote dearly defined. 

The purpose of this appendix on the implementation is therefore to 
clarity some of the concepts peripheral to the theory, tnd to lend substance to the notions set 
out irt it by exhibiting them at work. We mate no claims that the implementation we 
describe here is optimal, and it Ls certainly not unique. 

Datirkii cenumflbjif 

Each 3-D model in out current implementation is organized around a 
special name such as ttwrtTufvd, Itfmf, or even tOOOl, and we call these names lea.n!<f,i 
fdofj'dr-mtTtrrsj' Each (name specifies a memory location in the computer where the various 
shape informations associated with the particular 3-D model are stored, and the (name tS 
used to reference that information Many of the Inames In this appendix, such as 
Jguadfrufied or l/imfl, ate mnemonics for the shape the associated 3-D model represents 
This clarifies the presentation, but is uf no other significance. 

Figure H exhibits the information contained in 3-D models currently 
listed in the restricted 3-D model catalogue used by our present implementation. Each 3-n 
model, referenced by its Bname, has an associated! width and a list of relations among Its 
component axes. This list of rotations specifies the relative spatial dispositions of the 
components, and indicates a 3-D model for each one. for example the first relation kh the 
fprlmoff template is 

{tpTiWft jtor W gird fit fxitl jV /V ewiinf S Hit N), 


TABLE la 


Rftfu-easnlat I an af arng'ie and qf position 
01 recti on® and angles that oecw Ln ah adjunct relation are expressed in 
a vocabulary et aymbals. that define a value and a tolerance. The longer the 
?yinbol, the *nre accurately it 9|?ec i f i eg a value. Tad lea La 5 b define the values 
and tolerances of all symbols that occur in the figures. 


N M S E 



1,25 

1. 75 

* 

« 

jpr-r 1 leM 

FOE. 

*.* 

t. i 

1.5 

i 

c#ntir 


« 

■ -2$ 

*.?5 

• 

l«u*r 41*1t 


* 5 , a 

US* 

-155-1 

-15.* 

UQBir Mull 

C Lfti: 

it 

9E.a 

JI9.fi 

-90,1 

rantir 


-*a.i 

*6-0 

335. • 

-1JS.9 

louiir limit 


15,0 

13*.t 

* 

• 

Lp-j-ar Melt 

IHCL 

•.* 

99. 0 

lit.# 

* 

r<F,1ir 


* 

*6. 0 

ik.i 

* 

l9***r Melt 


*S,I 

155.5 

-us.# 

-*5-0 

MTF*r Melt 

Enas 

r. i 

SI.B 

L'.R.fl 

-9fi.fi 

cielur 


a* 

*5, 6 

m.fi 

-13l.fi 

llnr 1 leH 



s 

E 

H 

M 



6. fit 

1.12 

1,32 

fi. 09 

mpir ll*M 

SfllB 

«,#I 

9. 97 

fi.i 

I.U 

i*nt*r 


#.« 

9- fit 

1-12 

#.12 

1«U*F 11*1k 


i.JJ 

l.f 

1,5* 

*.*fi 

Ijjpir 19*1 t 

SIZE 

fi.ll 

fl.16 

l,fi 

2,71 

e«n1 nr 


#,# 

1 .22 

1.0 

1.6* 

Iwir Milt 


#,«« 

1.21 

• -■G5 

1.7* 

upppr iFnlt 

UIDTH 

t.H 

fi.lt 

fi-t 

L.2S 

e«e1 *r 


M 

*. it 

1.21 

1.66 

lg.fr Melt 


TABLE lb 



HN 

NU 

MM 

Hi 

91 


EE 

IN 



1-12 

*-37 

(.62 

fr,t? 

* 

• 

4 

* 

upptr 1,iM 

PM 

4.4 

(.31 

(.5 

0.75 

).£ 

* 

4 

4 



* 

VrlZ 

1.37 

1-52 

1.47 

• 

f 

4 

1 twr 1 h* 1 1 


72.5 

H7.5 

1124 

157,5 

-117.5 

-112,5 

-*7,5 

-72,4 

tippnr till; 

GIRD 

*-* 

*5.1 

51.4 

ail.* 

111.4 

-1)5.4 

-01,0 

-45. 9 

c*M*t 


-27-S 

27-5 

tr.i 

‘ 132-S 

1(7.5 

-147.5 

-1IJ4 

-*7,5 

lo>.f Mhll 


n.s 

B7,5 

112.6 

157,5 

4 


± 

4 

upptr 111 

tNtl 

t.t 

il-l 

va.t 

UK. 6 

IBB.? 

* 

* 

f 



* 

27. S 

£7.5 

112,5 

757.1 

■ 

* 

4 

3 nm r II«11 


2Z,5 

17.S 

1324 

E57.5 

-147.1 

4U-S 

-57,5 

-27,5 

Li^pir Malt 

EHD& 

a, a 

45,* 

3M 

L35. 5 

lll-l 

-135.1 

-M.( 

-45. * 

t*nt*r 


-72, i 

27.11 

£7.5 

217-5 

157.1 

-117.1 

-112-5 

-*7,5 

loir 1 ini 1 


59 

SE 

EE 

EH 

HN 

m 

MM 

MS 


0. ID 

4-95 

0.79 

1. Ll 

4.35 

9*2 

4.49 

I- IS 

L-npnr H n H 

fl. BJ 

4,94 

1.47 

1-17 

t,7 

4.37 

45* 

t-63 

Cinttr 

a,* 

(.» 

9.« 

0.0t 

*. IS 

B- 25 

0.42 

4.14 

1®w*r 1 Ini t 

R. 17 

4,21 

1.(7 

1.77 

1.24 

7-11 

3.44 

5.75 

uppir 1 > nil 

*, 13 

4.27 

9.31 

4,9 

1.4 

1.5* 

2.71 

*,*« 

C fr 1 tr 

a,a 

4-27 

«.21 

4.47 

1,77 

1,24 

2,11 

J.*3 

lour Unit 

a.49 

4-11 

0.L« 

4.3) 

Ml 

* 1* 

1,34 

2.3 

U|tfi*r 1 ml l 

a. (5 

4.91 

i.M 

4-2* 

1 .* 

4.55 

J-M 

2,79 

C*"1 l-r 

4.4 

4-99 

4,11 

4.14 

Ml 

4-51 

f.H 

1. 33 

1 CHir 1 Ini 1 


M LOTH 


Figure 15 The inform.! nor supplied by earlier visual proteges consists of a collect ion of 
Iwc-rfintentional Mlek. descriptions, together with information about the thickness associated 
with each, and their connectivity, The example shown here has been simplified to include 
only the Sticks for the top level 3-D model. The dollar name S0000 is the reference for a 
new 3-D model that will eventually represent the shape of tins stick figure. The FIGURE 
property oF $0000 relates the organization found in the Image to the structure required or a 
3-D model indicating syntactically that stick 0 Li the top-level single axis deception of the 
overall shape, stick l is the principal component of this shape,, and sticks 2 through *1 are its 
aitjcilnary axes. The table specifies the angular locations of the end-points oF each of these 
sticks in a viewer-centred coordinate system, along with their thicknesses. 



IDE 


r2 n 

6" 


O 5 

-e° 


12* 


IBRRfl 







FIGURCi ES 

ID £2) (31 

£53 m 

£71 > 



PACKET: TRUE 




st i-ckjS 1 


end a 



end & 



0 

9 

sjicfth 

0 

<p 

u 1 dth 

0 

S3,9 

-3,1 

4,3 

V, 5 

2.0 

5,1 

1 

S3,7 

-3.1 

3,a 

87.3 

2.0 

3.1 

2 

85.9 

Z,B 

5.2 

$3,3 

10,3 

5.4 

3 

63.9 

-3.5 

i.8 

92.3 

-2,9 

1,7 

4 

83. 5 

-2.7 

1,8 

32.4 

-Z, + 

1,7 

5 

67. e 

I,G 

i.g 

9G.G 

1.5 

L.S 

b 

87.1 

2.5 

1,9 

55. i 

2.4 

1,8 

7 

S3.5 

™3, 2 

9,e 

64.7 

-4.3 

ff,E 


notei all values are Im dcarcee 













which specifics the disposition nf rhe f^rm cylinder relative to the tp?im<ltr cylinder The 
*P rjinnre Cylinder IS ihe single cylinder lfp! esflll*Hon of the -whole pilmate shape, and the 
it or ft) is one of its component cylinders.. Notice that ttorw Is the dollar-name of another %■ 
D model Thai IS how Ihe concatenation rule between 3-D models is implemented. 

The oshet information in thas relaiion tonjisis of attribute-value pair*, 
such as poi W and gird Af. Theic two pams specify ih* position of the ttorsv cylinder along 
the axis to be W (which mearu in the middle-, between 0,25 arid 0.75); and the 

girdle-angle LO be N (which means within 45 decrees of 0). The symbol.! N, 5, NN, 

NNNW <ff. specify directions and tolerances, the longer symbols specifying a direction 
more precisely than the shorter ones. Table t defines the values and tolerances of all the 
symbols used here. 

The Iferli is the principal axis of the lyiriitjafe 3-D model, and the 
remaining relations held mi the model specify the dispositions of the auxiliary axes relative 
to it. Here, there are SIX auxiliary axes, the f Arihf, and four tlimbs. 

The Jo rm of ihe input 

The information iupplird by earlier vls-uak processes consist! of a 
collection of two-dimensional stick, description!, together with information about the 
thickness associated with each, and their connectivity. Figure A in the main text was 
obtained from a grey-level image using the techniques described by htarr (ISEGbl and it 
illustrates how information about AJtes may be obtained from an image. Figure 15 shows, an 
example that has been simplified by omitting the embedding relations. The dollar name 
$0000 is ihe reference for a new 3-D model that will eventually represent the shape of this 
stick figure. The FIGURE property of $0000 relate? the organization found in the image 
to the .structure required of a 3-D model- The information held here indicates that stick 0 is 
the 10 p-kve| .tingle axis model for the overall shape, stick t is the principal axii for the first 
elaborated 3-D model and itLcfei ? through 7 ate the auxiliary axes for this model, In a 
more detailed example, these auxiliary axes would themselves decompose into substructures. 
It might be ihe case, for example, thai Stick 2. (the figure’s bust) decomposed into two 
component sticks, corresponding to the neck and head If this added detail had been 
included in the input data, (2) would be replaced by (2 {6} (9)) m (he FIGURE property of 
tOCOO. 


Htwolcgf fli td the primary catalogue access 
The first step in the 3-D analysis of such a figure is to select an 
appropriate 3-D model from the catalogue, and to match it to the incoming stick figure (the 
two homology problems). This i! done by computing estimates of the adjuncts between the 
principal axis of the stick figure and ill auxdiaries.. and (him selecting that 3-D model whose 
adjunct relations are most simjlaT to the estimated ones. 

If the radial coordinates of the errd points of the Sticks in figure 15 were 
known, the Image-space proctor could be used to compute the required adjunct relations 
directly. They are not, but it turn? out that useful relations can be obtained this way by 
first assuming that all the radial distances are the same, which is equivalent to interpreting 


the image as if all its sticks lay ptrpoidjnji,, to the Itrre of sight. This is the surtlnr 
Hmftgurahon for the S-D r»del ,*es. aed a, the pnmulng coetiLes, better value, f or the 
^Adtai coordinates will be established. 

Figure !6 sham the result erf translating this initial cmFiguration into 
adjtmec relations via the Image-spate processor. Note that low resolution symbols have been 
USH m the computed relations, and that new S-D models for each axillary a sis have been 
treated The girdle-angles depend upon the particular choice of mo girdle direction, which 
is initially arbitrary. The only important thing about them now Is that some oF the relations 
hive gird £s and some have gtrd W. which correspond "above - and "below" the principal 
ax ts . The position parameter at this point is reasonably accurate, up (0 possible reversal if 
the wrong end of the principal axis has been iaken aj the aero end. 


Only positional values along the principal axis provide direrf help for 
selecting a 3-D model from (he catalogue, and even this Is Subject to a possihk polarity 
error. The remaining parameters do however provide indirect help, ev*n though they may 
be severely distorted by the a prior,* assumption that the stick* are coplanat For example 
although the Inclinations, girdles and sues may themselves be incorrect, certain rnterrdacions 
among them w,|l be preserved; the neck will have a girdle angle Of opposite sign to the 
legs, and the legs will all have similar imllmitionj, girdles and sizes because they are 
roughly parallel. This information is sufficient to select the tftiedntptd model From the 
lumped, third, tprimate and various itimb models, and the relations .n $ 0000 tAtl now 
be associated With the relations held m iqmdruptd. This information is inerted into the 
newly formed 3-D models under their TEMPLATE properties a* shown in figure ]? 


Reiaxaiian 

The next step is to use information in the tpadmptd model to compute 
better estimates for the radii, beginning with the principal axis, stick I. Figure EO of (he 
main text Shows how our program achieves this. A hilklimbing Algorithm is used, where 
the parameters *o be adjusted are the radial coordinate Of orie of Stick Is end points, and the 
zero gjrdle direction of the taxi ik which lies along st K k I. The taxi! represents the current 
attempt at match .fig the Jtorj* Co stick l and as the UxiS <4 incrementally rotated, ihe 
goodness of f(E OF the 3-D model IS computed by placing the lipew successively on the 
Storm relations Of the tqitmhtipe rf, and Accumulating a similarity score between the 
1ip/j:ari end-points arid, the associated sticks in the image. In the top tow of figure 10, the 
end points of the fittfj are equidistant from the viewer For three successive orientations of 
the zero-girdle direction. The "appearance'of the tquadmped is computed one axis at a 
time, and LS shown in lighter lines in the figure. The effect irf rotating about the lari, does 
not significantly improve the fit. In Lhe bottom tow, the radial value of the end of the 
Saxn has been improved, and now rotation about the Uxh Tods to a good alignment. This 
sers A new estimate For the radial coordinates of stick 1, and now, [he image-space processor 
14 Used to wt new esrimates for the radial components Of the lemamitlr sticks, based on 
relations 4lored in the J <ju.arfruptd. 


<0000 

1*0003 13301 pas n giro E incl n ftibc ti eaqd s size nj 

1*9001 10332 PDS S GIRO E CNTL N EMGCf S ErHD S SIZE N) 

<t30&L 10003 FDS U GIRD ENCL N EI1BG N EfflO S SIZE NJ 

11000L 10804 PG5 N GIRD U INCL « EMDG S EMBC S SIZE N> 

110331 13805 PQS S GIRD U INCL N 0tBG S EI1B0 S SIZE N) 

1.50031 19006 POS S GIRO U INCL N EflfiG S E1BD S SIZE NJ 

(10001 10337 PDS N GIRO M INCL U EM8G 5 EM0D S SIZE £3 

UIBTHi « 

FIGURE; (0 fl> 121 <31 «1 (S3 163 1731 
PACKETt TRUE 
TEMPLATE* ICVLiNDEfl 

taaal 
UlQTH: M 
FIGURE) ill 

PACKET: 10000 
TESTATE: ICYLIHOER 

%m92 
UlIGTHt N 
FIGURE: 121 

PACKET: 13330 
TEMPLATE: (CYLINDER 


10005 
WIDTH: E 
FIGURE: 153 

PACKET: 10003 
TEMPLATE: fCVLINDER 


130(36 
U30TH; E 
FIGURE: VS* 

PACKET: 10303 
TEMPLATE! 1CVLINDER 


10032 
WIDTHS E 
FIGURE: (31 

PACKET! 10330 
TEMPLATE: (CYLINDER 

13004 
W3DTHt E 
FIGURE: (41 

PACKET: 10333 
TBFLATEi KVL1NDER 


13037 
UlDTHi E 
FIGURE; (73 
PACKET! *0000 
TEMPLATES ICVUWER 


Figure IS. The first step in the processing of Eh? input information is the computation of a 
model-centrad description of the sticks. Radial information is required sn order to HIP the 
image space processor to compute this decripiiuh but it is not supplied in the Input. It turns 
out however that useful relations can be obtained by assuming that the radial distances to 
The end points of the sticks are the same. ThLS Is equivalent to assuming that all the sticks 
of the amage are in a plane perpendicular to the line of Sight. The result of translating this 
initial configuration inLo adjunct relations via the image-space processor using this 
assumption Is shown here. Note that low resolution symbols have been used in the 
computed relations, and that new 3-D models for each aysiiiary axis have been created.. 
The girdle-angles depend Upon the particular choice of zero girdle direction* which is 
arbitrary initially The only important thing about them now is that some of the relations 
have gird £, and some have gird W. which cotiespond w "above" and "below" the principal 
axLs. The position parameter ai this point is reasonably accurate, Up to possible reversal if 
the wrong end of the principal anil has been taken a? the zero end. 


10000 

RELATIONS! 

(I0B00 10001 PUS H CtRff E :NCL N £H0G N EliBD S SIZE N) 

rJ0901 *3002 POS S CJPD E I NCL N ^MSG £ EHSO S SEZE Hi 

I*0B0i *0003 POS N GJR0 If [NCL N EMRG N EhSD S SEZE Ml 

I *0001 *0004 POS N GIRD U (NCL N Et*G S ETBD S SfZE Mi 

<10001 (BBSS POS S GIRD U [NCL N EflBC S FhBD S SIZE N] 

(*0001 I000S POS S GIRO U I NCL N ETIBG S EJ10D 3 SIZE N) 

(10@ffl *0007 P05 N GIRD H 1HCL U EHBG S EHBD S SIZE E) 

UIOTH: M 

FIGURE* 10 Ell (2) (3) 14) ESI (GJ 171) 

PACKET* TRUE 
TEW.ATE! 100 ADR OPED 

>0001 

LftOTHi H 
FIGURE: 113 

PACKET: 10030 
TEMPLATES ITGFISQ 

10002 
WIDTHi H 
FIGURE: m 

PACKET [ 10000 

TEMPLATE: OUST 

*0003 
WIDTH* E 
FIGURE: 13) 

PACKET: 10000 
TEMPLATEt ILSUR 

*0004 
WIDTH: E 
FIGURE: W 

PACKET! 10000 
TEMPLATE: ILIHB 


1000S 
WIDTH, E 
FIGURE: T53 

PACK£T f *0000 
TEMPLATE: ILIHB 

WIDTH: E 
FIGURE* let 
PACKET: 10000 
TEMPLATE: *LTM9 

I0BR7 
U10TH: E 
FIGURE: |7) 

PACKET: 18000 
TEMPLATE: ITAEL 


Figure IT. The pos.itlorral distribution of the adjunct sticks (three appendages at each end of 
the principal a xj*), akmg with similarity relation! derived from the rtrrf. ind, stt* t and width 
parameter {four appendages, two on each end are very similar white a remaining one k 
very different), are used to s*re« a general model from the S-D mndel ntilutu* m this 
C t a " r J f selected as Indicated by the template property Hated under 

w ™ ! ’ h ^H'i '* als& earned out here aligning template P r«*r[l« tp the 
components of tOOOO and relating adjunct relation! in SQO&O to adjunct relation! in 
J^UUrtrupfif (these latter assignment; are not depleted here). 


tease 

RELATIONS i 

{*0000 10001 P05 NN SERB hN 
neeai *0202 pcs ss giro nn 

(10B01 *8003 POS MJ GIRO S$ 
110091 10804 P05 NN GIRO ss 
110001 10685 PG5 55 GIRO 5S 
[10001 10006 PQS SS GIRD ££ 
(10001 10307 POS NN GIRD SS 


I NCI NN EMBG SS EttED 5S SIZE NNI 
fNCL HU ENBH M EMBG NN SIZE NUI 
[NCL UU ETO& EW EPI6D SS SIZE NU> 
]NOL UU EH3G US Pr-BD US SIZE NUl 
1NCL W EMHG EN EH0D NN BiZE NUl 
INCL LJU EMBG US ENBO US SIZE NUJ 
[NCL US EMBG EN EMB0 55 SIZE EN1 


UIDTHi NU 

FIGURE: (0 m 121 <31 {41 (SI <'EJ 17) I 
PACKET: TRUE 
TEMPLATE1 iGUftflRUPEP 


Figure IB We we here the state of IlW '00 jUH after the completion of the relaxation process 
depicted in figure LO The adjunct relations have been recomputed by the image ip ace 
processor using symbols With a slightly higher tevel oF resolution. 


1BBBB 

RELATIONS: 

110000 S0001 POS 

i 10001 imez Pas 

1*0001 13003 P09 
150301 10004 a OS 
113001 10005 POS 
113001 10039 POS 
110001 10007 PQ5 
WIDTH: NUI 

FIGURE: [0 (II (2I 131 
PACKET: TflVi 
IEW1ATE: iGlftftFFE 


NN GIRD Ntt tNCl_ Nil ENBG 
SS GIRD m rNHL NU EflBG 
NN GIRD SB rNEC UU EtIBC 
NN GIRD 55 INCL UU EE16G 
SS GIRD SS INLL UU EJ10C 
SS GIRD SS INCL UW E?10D 
NN GIRD SS [NCI US ETIBG 

W 1S> (6> 1711 


5S EflBD SS SIZE NN) 
NN Eneo NN SIZE NU) 
EH ETIBG SS SIZE NU) 
US E“BD US SIZE NUI 
EH EHBD NN SIZE NU) 
US EnSD US SIZE NU) 
EH EH0D SS SIZE EN) 


Figme 19. The adjunct relations In figure 1$ are used again to access a 3-D model from the 
3-D model catalogue. This access results in the selection of the Sgiraffr S-D model, baled 
largely on the lengths of the neclk and legs relative to the tqr*p, and the fitit ttige of 
recognition u complete. 


Secondary catalogue accnl anti JrtCgnitien 
Having found I he 3-D model orientation that achieves the best fit, we 
cart nnw compute a nr* ret of Adjunct relations for t^OOO, the model that Is being built. 
These are shown in figure IB. Notice that we are now using symbols with a slightly higher 
level of resolution. The 3-D model catalogue can now be Accessed tn search of a more 
specific shape. This access results in the selechnn of the $ giraffe 3-D model, based largely 
Oil the lengths of the nect and legs relative to the torso, and the first stage of recognition Is 
complete. The final stale of the 3-D model ri shown in figure 19. 

Adt-nmiaigmcnti: We iharA Drew McDermott, Tomaso Poggio, and Kent Stevens for 
valuable criticism, and Karen Piendergnst for preparing the drawings. This article 
describes wor* reported in M. 1. T. A. ]. Lab. Memo 341, and It was conducted at Che 
Artkifial Intelligence Laboratory, a Massachusetts Institute of Technology research program 
supported in part by the Advanced Research Projects Agency of the Department of Defense 
and monitored by the Office of Naval Research under Contract number NWOl^IJr-C-Ofrii. 


Rtftrtncts 

Adrian, E. D. 1911 Afferent discharges to the cerebral cortex from peripheral sense organs. 
/. PApiol. (Land.). WO. 159491 , 

Agin, g. J. T972 Representation and description of curved objects Jrtit/orrf A, 1 Memc 17? 

Altman, J. M., Kaas, J. H., Lane, R. H, fr Mipun, F, M 1972 A representation or the visual 
field, in the inferior nucleus of the pulvinar in ihe owl monkey (Aafus trimrga: mj). Bruin 
Re natch. 40, 291-30$. 

Allman, J M , Kaas, j H. & Lane, R. H 1973 The middle temporal visual area (MT) in the 
bush baby, Galago stnegattnili. Brain Research, 57. [07-2D2. 

Alima n t J. M fir Kui, J H I9"4a The organisation of the second visual area (V-II) in the 
owl monkey: * second order transformation of the visual hemifietd. Bruira Research, 76. 
247-265. 


Allman, j. M & Kaas, J. H. 1471b A visual area adjoining the second visual are* <V-I|) or 
the medial wall of parieto-oceipital eortek of the owl monkey fAotui trivirgatus}. Anal. 
Rtc.. 1?S, 297 -S. 


Allman, J. M. fr Kaas, J H 1474c A c rescent -shaped cortical visual area surrounding the 
middle temporal area (MT) in the owl monkey Iririrfafot). Bruin Reita*ch, Si, 199- 

211 

Bin ford, T. O. 197] Visual perception by computer. Presented to the IEEE Conference on 
Systems and Control. Miami, in December E9Tl. 

Blum H. 1973 Biological shape and visual science, [part I). J.thtcr. Bid., 1?, 205 - 257 . 

Brodmann, K 1909 Verglrithendt L^Ui'i.jijj'fctt/ddtf dtr GrotihirnrinHe in iAren Prlnxipien 
dargtititlt it uf 6'ritUd del Zrtlenttitirl. Lelptig j. A. Barth. 

Cajal, S. Ramon y, 19]I Histdsgit du lyiteme nfr^ftw: rfr ti del vtrltbrei 2 vqIs. 

Paris: Norbert Maloine. 


Cooper. L. A. St Shepard, R N. 1473 a The time required to prepare for a rotated stimulus. 
Merrusrj ant j Cagr.ifton, 1. 246-2W. 

Cooper, L. A. A- Shepard, R. N. 1973 b Chronometric studies of the rotation of mental 
images. lit: W. G. Chase (£d,) h in/ienuliffl New Ysrit: Academic Press. 


Crftthley* M. 1053 TAc parietal iobei London- Edward ArnoM. 

DuffleM, A- M- et al 1969 Applications of artificial intelligence for chemical inference, II: 
Interpretation of low-resolution mass spectra of ketonei, /, 4»r Cfttm Sk., 91, 2977-2351, 

Hollerbach, J. M 1975 Hierarchical shape description by seled-ion and mndificaiinn of 
proioiypei. M t. T. Master's Thrill, to u ppettr as M i T- A 1 Leb TR~?46, 

Kosslyn, S. M. 1075 Information representation in visual images Ccgnitiur Psychology, 7, 3HI- 

570. 

LurJa. A R 1970 Fraiimaffe aphasia, The Hague: Moulton. 

Marr, D. 1976a Artificial Intelligence ■■ a personal view. M. 1. T. A, L Lab. Memo 35 5 

Marr, D. 1976b Early processing of visual information. PAii. Tram. Roy. 3k. H. t {in the 

press). 

Marr, D 1976c. Analysis of occluding contour, M. 1. T A. I, Lab. Mmo 372, 

Marr, t). & Foggio. T. 997£a Cooperative computation oF stereo disparity. 5cicrtcF h 
( submitted for publication). Also available as Af. 1. T. A- L Lab. Memo 364. 

Marr, D & Peggie, T. 1976b From understanding computation to understanding neural 
circuitry. In The Visual Field: Psychophysics and Neurophysiology. Neurosciences Research 
Pro gram Bulletin, E- Foeppcl e( ai., Eds (in the press) 

Metilcr. j. Sr Shepard., R. N. 1974 Transformational studies oF the internal representation of 
thr«-dlmensional objects. In T^nctieJ of Lpffli/iue psychology: The Loyola Jyviporitiwt. Ed 
R Solso. HiUsdale, N. J Lawrence Erlbaum Assoc. 

MinsVy. M. 1976 A framework For representing Vnowledge in: The psychology af computer 
vision, Ed. P. H. 'Winston, pp 2H-277. New York: McGraw-Hill 

Minsky. M. Sc Paper!, S 1972 Artificial intelligence progress report. Af. 1. T. A. 1. Lab. 
Memo 252 - 

Moscs, J. 1974 M AC-SYM A - the fifth year. 3108 Ah f Bulletin, ACM. 8, 105-110. See also 
The MACSVMA reference maauai, M. I. T Laboratory for Computer Science, 545 
Technology Square, Cambridge, Mass. 02139. 

Nevada, R, (974 Structured descriptjons of complex curved objects for recognition and 
visual memory. Stanford A. /. Afewo 250. 


SthAflk, R C 1975 Conceptual infomwim pwxtuing New York: Elsevier. 

Shepard, R. N. 1975 Form, forirwtioru and transformation of internal representations. In. 
i nfortnation prectJllng cnrf copilUon: Loyola Symposium Ed iR. Soiso. pp 67-122 

Hillsdale,. N. j.: Lawrence Erlbautn MiW, 

Shepard, R. N. & MeUtef, J. !97l Mental rotation of ihree-dlmemUonal objects. Science, S?! r 
70P7D3. 

Shorlliffe, E. H 1976 Computer-Based Mediial CflnsitffafrOJtJ: MYCiN, New York: 
American Elsevier Publishing Company,. Ih£~ 

Street, R F- A- L931 A Gestalt completion test: A StweJY of A cross-jecricm of Intellect ItV 
Teat Seri CNlegt Canfrftvf^nt (a Education, No. 48}. New York: Teachers CoMejre, 
Columbia University. 

Ulfman, S. 1976 Structure from motion. IM. ). T. Ph D T fa tit tfl pttpa.TO.tiQn) 

Yatan, P. Si Mair, [> 1976, Algorithms for the decomposition of a contour, In pet pa ration. 

V inkers, P. J. Sc Bruyn, C W. 1999 Edi. /Jaiarffrct if. Clinita! Neurology.- Pol. 2 , 
Localization in Clinical Nturologf (In association with A. Eiemoud). Amsterdam: North 
Holland Publishing Co. 

Vfarnngton, E. K. & Taylor, A. M, 1973 The contribution of the right parietal lobe ta object 
recognition. Cortex, 9, 352-IG4, 

Also remarks made by E K. W. in a lecture gi^efi on Oct. 26th 1973 At the M. I T 
Psycho log y D epartment 

Warrington, E. K. 1975 The selective impairment of semantic memory. Quart. J - exp 
PsythN,, 27 t 635'657 

Zekl, S. IYL 1973 Cortical pfO/Cttons from two prestriate areas In the monkey. Brain 
Research, M, 19-35. 


2cti, S. M. 1973 Colour coding in rhesus monkey prestllite cortex. Brain Research 55, 422 
427. 


